Authored Publications
Sort By
Making Transformers Solve Compositional Tasks
Joshua Ainslie
Vaclav Cvicek
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Association for Computational Linguistics (2022), pp. 3591-3607
Preview abstract
Several studies have reported the inability of Transformer models to generalize compositionally. In this paper, we explore the design space of Transformer models, showing that several design decisions, such as the position encodings, decoder type, model architecture, and encoding of the target task imbue Transformers with different inductive biases, leading to better or worse compositional generalization. In particular we show that Transformers can generalize compositionally significantly better than previously reported in the literature if configured appropriately.
View details
LogicInference: A New Dataset for Teaching Logical Inference to seq2seq Models
Joshua Ainslie
Vaclav Cvicek
ICLR 2022 Workshop on Elements of Reasoning: Objects, Structure and Causality
Preview abstract
Machine learning models such as Transformers or LSTMs struggle with tasks that are compositional in nature such as those involving reasoning/inference. Although many datasets exist to evaluate compositional generalization, when it comes to evaluating inference abilities, options are more limited. This paper presents LogicInference, a new dataset to evaluate the ability of models to perform logical inference. The dataset focuses on inference using propositional logic and a small subset of first-order logic, represented both in semi-formal logical notation, as well as in natural language. We also report initial results using a collection of machine learning models to establish an initial baseline in this dataset.
View details
ETC: Encoding Long and Structured Inputs in Transformers
Anirudh Ravula
Joshua Ainslie
Li Yang
Qifan Wang
Vaclav Cvicek
2020 Conference on Empirical Methods in Natural Language Processing (EMNLP 2020)
Preview abstract
Transformer models have advanced the state of the art in many NLP tasks. In this paper, we present a new Transformer architecture, Extended Transformer Construction (ETC), that addresses two key limitations of existing architectures, namely: scaling input length, and ingesting structured inputs. The main innovation is a new global-local attention mechanism between a global memory and the input tokens, which allows scaling attention to longer inputs. We show that combining global-local attention with relative position encodings and a Contrastive Predictive Coding (CPC) pre-training task allows ETC to naturally handle structured data. We achieve new state-of-the-art results on two natural language datasets requiring long and/or structured inputs.
View details