Nicholas FitzGerald

Nicholas FitzGerald

Nicholas FitzGerald is a Research Scientist at Google, working on semantic representations of language. He received a Ph.D. in 2018 from the Paul G. Allen School of Computer Science & Engineering at the University of Washington. Before that, he completed an undergraduate degree in Cognitive Systems (Computational Intelligence and Design) at the University of British Columbia. His work on large-scale QA-SRL parsing received an honourable mention for best paper at ACL 2018. See his personal webpage for more information.
Authored Publications
Sort By
  • Title
  • Title, descending
  • Year
  • Year, descending
    Mention Memory: incorporating textual knowledge into Transformers through entity mention attention
    Michiel de Jong
    Yury Zemlyanskiy
    10th International Conference on Learning Representations, ICLR 2022, Virtual Conference , April 25-29, 2022, OpenReview.net
    Preview abstract Natural language understanding tasks such as open-domain question answering often require retrieving and assimilating factual information from multiple sources. We propose to address this problem by integrating a semi-parametric representation of a large text corpus into a Transformer model as a source of factual knowledge. Specifically, our method represents knowledge as a ``mention memory" containing a dense vector representation of every entity mention in a corpus. The Transformer model accesses the information through internal memory layers in which each entity mention in the passage being read attends to the mention memory. This approach enables synthesis of and reasoning over many disparate sources of information \textit{within} a single Transformer model. In experiments using a memory of ~150 million Wikipedia mentions, our model provides to strong improvements in performance on several open-domain knowledge-intensive tasks, including the claim verification benchmarks FEVER and HoVeR and several entity-based QA benchmarks. We also show that the model learns to attend to informative mentions without any direct supervision. Finally we show that the model can be adapted to generalize to new unseen entities by updating the memory, without retraining. View details
    MOLEMAN: Mention-Only Linking of Entities with a Mention Annotation Network
    Jan A. Botha
    Dan Bikel
    Andrew McCallum
    Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 2: Short Papers), Association for Computational Linguistics, Online (2021), pp. 278-285
    Preview abstract We present an instance-based nearest neighbor approach to entity linking. In contrast to most prior entity retrieval systems which represent each entity with a single vector, we build a contextualized mention-encoder that learns to place similar mentions of the same entity closer in vector space than mentions of different entities. This approach allows all mentions of an entity to serve as "class prototypes" as inference involves retrieving from the full set of labeled entity mentions in the training set and applying the nearest mention neighbor’s entity label. Our model is trained on a large multilingual corpus of mention pairs derived from Wikipedia hyperlinks, and performs nearest neighbor inference on an index of 700 million mentions. It is simpler to train, gives more interpretable predictions, and outperforms all other systems on two multilingual entity linking benchmarks. View details
    Preview abstract Language modeling tasks, in which words, or word-pieces, are predicted on the basis of a local context, have been very effective for learning word embeddings and context dependent representations of phrases. Motivated by the observation that efforts to code world knowledge into machine readable knowledge bases or human readable encyclopedias tend to be entity-centric, we investigate the use of a fill-in-the-blank task to learn context independent representations of entities from the text contexts in which those entities were mentioned. We show that large scale training of neural models allows us to learn high quality entity representations, and we demonstrate successful results on four domains: (1) existing entity-level typing benchmarks, including a 64% error reduction over previous work on TypeNet (Murty et al., 2018); (2) a novel few-shot category reconstruction task; (3) existing entity linking benchmarks, where we match the state-of-the-art on CoNLL-Aida without linking-specific features and obtain a score of 89.8% on TAC-KBP 2010 without using any alias table, external knowledge base or in domain training data and (4) answering trivia questions, which uniquely identify entities. Our global entity representations encode fine-grained type categories, such as Scottish footballers, and can answer trivia questions such as: Who was the last inmate of Spandau jail in Berlin? View details
    Entities as Experts: Sparse Memory Access with Entity Supervision
    Thibault Févry
    Eunsol Choi
    EMNLP 2020 - Conference on Empirical Methods in Natural Language Processing (to appear)
    Preview abstract We focus on the problem of capturing declarative knowledge about entities in the learned parameters of a language model. We introduce a new model—Entities as Experts (EAE)— that can access distinct memories of the entities mentioned in a piece of text. Unlike previous efforts to integrate entity knowledge into sequence models, EAE’s entity representations are learned directly from text. We show that EAE’s learned representations capture sufficient knowledge to answer TriviaQA questions such as “Which Dr. Who villain has been played by Roger Delgado, Anthony Ainley, Eric Roberts?”, outperforming an encoder-generator Transformer model with 10× the parameters. According to the LAMA knowledge probes, EAE contains more factual knowledge than a similarly sized BERT, as well as previous approaches that integrate external sources of entity knowledge. Because EAE associates parameters with specific entities, it only needs to access a fraction of its parameters at inference time, and we show that the correct identification and representation of entities is essential to EAE’s performance. View details
    Preview abstract In this work, we present an entity linking model which combines a Transformer architecture with large scale pretraining from Wikipedia links. Our model achieves the state-of-the-art on two commonly used entity linking datasets: 96.7% on CoNLL and 94.9% on TAC-KBP. We present detailed analyses to understand what design choices are important for entity linking, including choices of negative entity candidates, Transformer architecture, and input perturbations. Lastly, we present promising results on more challenging settings such as end-to-end entity linking and entity linking without in-domain training data. View details
    Preview abstract Language modeling tasks, in which words are predicted on the basis of a local context, have been very effective for learning word embeddings and context dependent representations of phrases. Motivated by the observation that efforts to code world knowledge into machine readable knowledge bases tend to be entity-centric, we investigate the use of a fill-in-the-blank task to learn context independent representations of entities from the contexts in which those entities were mentioned. We show that large scale training of neural models allows us to learn extremely high fidelity entity typing information, which we demonstrate with few-shot reconstruction of Wikipedia categories. Our learning approach is powerful enough to encode specialized topics such as Giro d'Italia cyclists. View details
    Matching the Blanks: Distributional Similarity for Relation Learning
    Jeffrey Ling
    ACL 2019 - The 57th Annual Meeting of the Association for Computational Linguistics (2019) (to appear)
    Preview abstract General purpose relation extractors, which can model arbitrary relations, are a core aspiration in information extraction. Efforts have been made to build general purpose extractors that represent relations with their surface forms, or which jointly embed surface forms with relations from an existing knowledge graph. How ever, both of these approaches are limited in their ability to generalize. In this paper, we build on extensions of Harris’ distributional hypothesis to relations, as well as recent advances in learning text representations (specifically, BERT), to build task agnostic relation representations solely from entity-linked text. We show that these representations significantly outperform previous work on exemplar based relation extraction (FewRel) even without using any of that task’s training data. We also show that models initialized with our task agnostic representations, and then tuned on supervised relation extraction datasets, significantly outperform the previous methods on SemEval 2010 Task 8, KBP37, and TACRED. View details
    Semantic Role Labeling with Neural Network Factors
    Oscar Täckström
    Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing (EMNLP '15), Association for Computational Linguistics
    Preview abstract We present a new method for semantic role labeling in which arguments and semantic roles are jointly embedded in a shared vector space for a given predicate. These embeddings belong to a neural network, whose output represents the potential functions of a graphical model designed for the SRL task. We consider both local and structured learning methods and obtain strong results on standard PropBank and FrameNet corpora with a straightforward product-of-experts model. We further show how the model can learn jointly from PropBank and FrameNet annotations to obtain additional improvements on the smaller FrameNet dataset. View details