- Santiago Ontanon
- Joshua Ainslie
- Vaclav Cvicek
- Zach Fisher
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Association for Computational Linguistics (2022), pp. 3591-3607
Several studies have reported the inability of Transformer models to generalize compositionally. In this paper, we explore the design space of Transformer models, showing that several design decisions, such as the position encodings, decoder type, model architecture, and encoding of the target task imbue Transformers with different inductive biases, leading to better or worse compositional generalization. In particular we show that Transformers can generalize compositionally significantly better than previously reported in the literature if configured appropriately.
We maintain a portfolio of research projects, providing individuals and teams the freedom to emphasize specific types of work