ETC: Encoding Long and Structured Inputs in Transformers

Anirudh Ravula
Joshua Ainslie
Li Yang
Qifan Wang
Vaclav Cvicek
2020 Conference on Empirical Methods in Natural Language Processing (EMNLP 2020)

Abstract

Transformer models have advanced the state of the art in many NLP tasks. In this paper, we present a new Transformer architecture, Extended Transformer Construction (ETC), that addresses two key limitations of existing architectures, namely: scaling input length, and ingesting structured inputs. The main innovation is a new global-local attention mechanism between a global memory and the input tokens, which allows scaling attention to longer inputs. We show that combining global-local attention with relative position encodings and a Contrastive Predictive Coding (CPC) pre-training task allows ETC to naturally handle structured data. We achieve new state-of-the-art results on two natural language datasets requiring long and/or structured inputs.

Research Areas