ETC: Encoding Long and Structured Inputs in Transformers

Anirudh Ravula

Chris Alberti

Joshua Ainslie

Li Yang

Philip Minh Pham

Qifan Wang

Santiago Ontanon

Sumit Kumar Sanghai

Vaclav Cvicek

Zach Fisher

2020 Conference on Empirical Methods in Natural Language Processing (EMNLP 2020)

Download Google Scholar

Abstract

Transformer models have advanced the state of the art in many NLP tasks. In this paper, we present a new Transformer architecture, Extended Transformer Construction (ETC), that addresses two key limitations of existing architectures, namely: scaling input length, and ingesting structured inputs. The main innovation is a new global-local attention mechanism between a global memory and the input tokens, which allows scaling attention to longer inputs. We show that combining global-local attention with relative position encodings and a Contrastive Predictive Coding (CPC) pre-training task allows ETC to naturally handle structured data. We achieve new state-of-the-art results on two natural language datasets requiring long and/or structured inputs.

Research Areas

Machine Intelligence

Defining the technology of today and tomorrow.

Philosophy

People

Teams

AI/ML Foundations  & Capabilities

Algorithms & Optimization

Computing Paradigms

Responsible Human-Centric Technology

Science & Societal Impact

Projects

Publications

Resources

Shaping the future, together.

Student programs

Faculty programs

Conferences & events

ETC: Encoding Long and Structured Inputs in Transformers

Abstract

Research Areas

Learn more about how we conduct our research

Defining the technology of today and tomorrow.

Philosophy

People

Teams

AI/ML Foundations & Capabilities

Algorithms & Optimization

Computing Paradigms

Responsible Human-Centric Technology

Science & Societal Impact

Projects

Publications

Resources

Shaping the future, together.

Student programs

Faculty programs

Conferences & events

ETC: Encoding Long and Structured Inputs in Transformers

Abstract

Research Areas

Learn more about how we conduct our research

AI/ML Foundations  & Capabilities