Making Transformers Solve Compositional Tasks

Joshua Ainslie
Vaclav Cvicek
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Association for Computational Linguistics (2022), pp. 3591-3607

Abstract

Several studies have reported the inability of Transformer models to generalize compositionally. In this paper, we explore the design space of Transformer models, showing that several design decisions, such as the position encodings, decoder type, model architecture, and encoding of the target task imbue Transformers with different inductive biases, leading to better or worse compositional generalization. In particular we show that Transformers can generalize compositionally significantly better than previously reported in the literature if configured appropriately.

Research Areas