- Anirudh Ravula
- Bhargav Kanagal
- Joshua Ainslie
- Ruining He
(2021)
Transformer is the backbone of modern NLP models. In this paper, we propose RealFormer, a simple and generic technique to create Residual Attention Layer Transformer networks that significantly outperform canonical Transformer and its variations of different sizes on a wide spectrum of tasks/benchmarks including Masked Language Modeling, GLUE, SQuAD, Neural Machine Translation, WikiHop, HotpotQA, Natural Questions, and OpenKP. Qualitatively, RealFormer stabilizes training and leads to models with sparser attentions. Code and pre-trained checkpoints will be open-sourced.
We maintain a portfolio of research projects, providing individuals and teams the freedom to emphasize specific types of work