LongT5: Efficient Text-To-Text Transformer for Long Sequences

Mandy Guo; Joshua Ainslie; David Uthus; Santiago Ontanon; Jianmo Ni; Yun-Hsuan Sung; Yinfei Yang

LongT5: Efficient Text-To-Text Transformer for Long Sequences

Mandy Guo

Joshua Ainslie

David Uthus

Santiago Ontanon

Jianmo Ni

Yun-Hsuan Sung

Yinfei Yang

Findings of the Association for Computational Linguistics: NAACL 2022, Association for Computational Linguistics

Download Google Scholar

Abstract

Recent work has shown that either (1) increasing the input length or (2) increasing model size can improve the performance of Transformer-based neural models. In this paper, we present a new model, called LongT5, with which we explore the effects of scaling both the input length and model size at the same time. Specifically, we integrated attention ideas from long-input transformers (ETC), and adopted pre-training strategies from summarization pre-training (PEGASUS) into the scalable T5 architecture. The result is a new attention mechanism we call Transient Global (TGlobal), which mimics ETC's local/global attention mechanism, but without requiring additional side-inputs. We are able to achieve state-of-the-art results on several summarization tasks and outperform the original T5 models on question answering tasks.

Research Areas

Natural language processing

Explore our many areas of focus

Building a collaborative ecosystem

Shaping the future together

Translating discovery into real-world impact

LongT5: Efficient Text-To-Text Transformer for Long Sequences

Abstract

Research Areas

Meet the teams driving innovation

Google AI

Google Cloud

Google DeepMind

Google Labs