Jump to Content
Zach Fisher

Zach Fisher

Authored Publications
Google Publications
Other Publications
Sort By
  • Title
  • Title, descending
  • Year
  • Year, descending
    Making Transformers Solve Compositional Tasks
    Joshua Ainslie
    Vaclav Cvicek
    Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Association for Computational Linguistics (2022), pp. 3591-3607
    Preview abstract Several studies have reported the inability of Transformer models to generalize compositionally. In this paper, we explore the design space of Transformer models, showing that several design decisions, such as the position encodings, decoder type, model architecture, and encoding of the target task imbue Transformers with different inductive biases, leading to better or worse compositional generalization. In particular we show that Transformers can generalize compositionally significantly better than previously reported in the literature if configured appropriately. View details
    LogicInference: A New Dataset for Teaching Logical Inference to seq2seq Models
    Joshua Ainslie
    Vaclav Cvicek
    ICLR 2022 Workshop on Elements of Reasoning: Objects, Structure and Causality
    Preview abstract Machine learning models such as Transformers or LSTMs struggle with tasks that are compositional in nature such as those involving reasoning/inference. Although many datasets exist to evaluate compositional generalization, when it comes to evaluating inference abilities, options are more limited. This paper presents LogicInference, a new dataset to evaluate the ability of models to perform logical inference. The dataset focuses on inference using propositional logic and a small subset of first-order logic, represented both in semi-formal logical notation, as well as in natural language. We also report initial results using a collection of machine learning models to establish an initial baseline in this dataset. View details
    ETC: Encoding Long and Structured Inputs in Transformers
    Anirudh Ravula
    Joshua Ainslie
    Qifan Wang
    Vaclav Cvicek
    2020 Conference on Empirical Methods in Natural Language Processing (EMNLP 2020)
    Preview abstract Transformer models have advanced the state of the art in many NLP tasks. In this paper, we present a new Transformer architecture, Extended Transformer Construction (ETC), that addresses two key limitations of existing architectures, namely: scaling input length, and ingesting structured inputs. The main innovation is a new global-local attention mechanism between a global memory and the input tokens, which allows scaling attention to longer inputs. We show that combining global-local attention with relative position encodings and a Contrastive Predictive Coding (CPC) pre-training task allows ETC to naturally handle structured data. We achieve new state-of-the-art results on two natural language datasets requiring long and/or structured inputs. View details
    No Results Found