Google Research

RealFormer: Transformer Likes Residual Attention

(2021)

Abstract

Transformer is the backbone of modern NLP models. In this paper, we propose RealFormer, a simple and generic technique to create Residual Attention Layer Transformer networks that significantly outperform canonical Transformer and its variations of different sizes on a wide spectrum of tasks/benchmarks including Masked Language Modeling, GLUE, SQuAD, Neural Machine Translation, WikiHop, HotpotQA, Natural Questions, and OpenKP. Qualitatively, RealFormer stabilizes training and leads to models with sparser attentions. Code and pre-trained checkpoints will be open-sourced.

Research Areas

Learn more about how we do research

We maintain a portfolio of research projects, providing individuals and teams the freedom to emphasize specific types of work