Livio Baldini Soares
Authored Publications
Sort By
NAIL: Lexical Retrieval Indices with Efficient Non-Autoregressive Decoders
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, EMNLP 2023 (to appear)
Preview abstract
Neural document rerankers are extremely effective in terms of accuracy. However, the best models require dedicated hardware for serving, which is costly and often not feasible. To avoid this serving-time requirement, we present a method of capturing up to 86% of the gains of a Transformer cross-attention model with a lexicalized scoring function that only requires 10-6% of the Transformer's FLOPs per document and can be served using commodity CPUs. When combined with a BM25 retriever, this approach matches the quality of a state-of-the art dual encoder retriever, that still requires an accelerator for query encoding. We introduce NAIL (Non-Autoregressive Indexing with Language models) as a model architecture that is compatible with recent encoder-decoder and decoder-only large language models, such as T5, GPT-3 and PaLM. This model architecture can leverage existing pre-trained checkpoints and can be fine-tuned for efficiently constructing document representations that do not require neural processing of queries.
View details
1-Pager: One Pass Answer Generation and Evidence Retrieval
Palak Jain
The 2023 Conference on Empirical Methods in Natural Language Processing (2023) (to appear)
Preview abstract
We present 1-PAGER the first system that answers a question and retrieves evidence using
a single Transformer-based model and decoding process. 1-PAGER incrementally partitions
the retrieval corpus using constrained decoding to select a document and answer string,
and we show that this is competitive with comparable retrieve-and-read alternatives according to both retrieval and answer accuracy metrics. 1-PAGER also outperforms the equivalent ‘closed-book’ question answering model,
by grounding predictions in an evidence corpus. While 1-PAGER is not yet on-par with
more expensive systems that read many more
documents before generating an answer, we argue that it provides an important step toward
attributed generation by folding retrieval into
the sequence-to-sequence paradigm that is currently dominant in NLP. We also show that the
search paths used to partition the corpus are
easy to read and understand, paving a way forward for interpretable neural retrieval.
View details
Attributed Question Answering: Evaluation and Modeling for Attributed Large Language Models
Pat Verga
Jianmo Ni
arXiv (2022)
Preview abstract
Large language models (LLMs) have shown impressive results across a variety of tasks while requiring little or no direct supervision. Further, there is mounting evidence that LLMs may have potential in information-seeking scenarios. We believe the ability of an LLM to attribute the text that it generates is likely to be crucial for both system developers and users in this setting. We propose and study Attributed QA as a key first step in the development of attributed LLMs. We develop a reproducable evaluation framework for the task, using human annotations as a gold standard and a correlated automatic metric that we show is suitable for development settings. We describe and benchmark a broad set of architectures for the task. Our contributions give some concrete answers to two key questions (How to measure attribution?, and How well do current state-of-the-art methods perform on attribution?), and give some hints as to how to address a third key question (How to build LLMs with attribution?).
View details
Adaptable and Interpretable Neural Memory Over Symbolic Knowledge
Haitian Sun
Pat Verga
Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Association for Computational Linguistics (2021), pp. 3678-3691
Preview abstract
Past research has demonstrated that large neural language models (LMs) encode surprising amounts of factual information: however, augmenting or modifying this information requires modifying a corpus and retraining, which is computationally expensive. To
address this problem, we develop a neural LM
that includes an interpretable neuro-symbolic
KB in the form of a “fact memory”. Each
element of the fact memory is formed from
a triple of vectors, where each vector corresponds to a KB entity or relation. Our LM improves performance on knowledge-intensive
question-answering tasks, sometimes dramatically, including a 27 point increase in one setting of WebQuestionsSP over a state-of-the-art
open-book model, despite using 5% of the parameters. Most interestingly, we demonstrate
that the model can be modified, without any
re-training, by updating the fact memory
View details
QED: A Linguistically Principled Framework for Explainable Question Answering
Eunsol Choi
TACL (2021)
Preview abstract
A question answering system that in addition to providing an answer provides an explanation of the reasoning that leads to that answer has potential advantages in terms of debuggability, extensibility, and trust. To this end, we propose QED, a linguistically informed, extensible framework for explanations in question answering. A QED explanation specifies the relationship between a question and answer according to formal semantic notions such as referential equality, sentencehood, and entailment. We describe and publicly release an expert-annotated dataset of QED explanations built upon a subset of the Google Natural Questions dataset, and report baseline models on two tasks—post- hoc explanation generation given an answer, and joint question answering and explanation generation. In the joint setting, a promising result suggests that training on a relatively small amount of QED data can improve question answering. In addition to describing the formal, language-theoretic motivations for the QED approach, we describe a large user study showing that the presence of QED explanations significantly improves the ability of untrained raters to spot errors made by a strong neural QA baseline.
View details
Evaluating Explanations: How much do explanations from teachers aid students?
Danish Pruthi
Rachit Bansal
Bhuwan Dhingra
Zachary Chase Lipton
Graham Neubig
Transactions of the Association for Computational Linguistics (TACL) (2021)
Preview abstract
While many methods purport to explain predictions by highlighting salient features, what aims these explanations serve and how they ought to be evaluated often go unstated. In this work, we introduce a framework to quantify the value of explanations via the accuracy gains that they confer on a student model trained to simulate a teacher model. Crucially, the explanations are available to the student during training, but are not available at test time. Compared to prior proposals, our approach is less easily gamed, enabling principled, automatic, model-agnostic evaluation of attributions. Using our framework, we compare numerous attribution methods for text classification and question answering, and observe quantitative differences that are consistent (to a moderate to high degree) across different student model architectures and learning strategies.
View details
New Protocols and Negative Results for Textual Entailment Data Collection
Sam Bowman
Emily Blythe Pitler
EMNLP 2020 - Conference on Empirical Methods in Natural Language Processing (to appear)
Preview abstract
Natural language inference (NLI) data has proven useful in benchmarking and, especially, as pretraining data for tasks requiring language understanding. However, the crowdsourcing protocol that was used to collect this data has known issues and was not explicitly optimized for either of these purposes, so it is likely far from ideal. We propose four alternative protocols, each aimed at improving either the ease with which annotators can produce sound training examples or the quality and diversity of those examples. Using these alternatives and a fifth baseline protocol, we collect and compare five new 8.5k-example training sets. In evaluations focused on transfer learning applications, our results are solidly negative, with models trained on our baseline dataset yielding good transfer performance to downstream tasks, but none of our four new methods (nor the recent ANLI) showing any improvements over that baseline. In a small silver lining, we observe that all four new protocols, especially those where annotators edit pre-filled text boxes, reduce previously observed issues with annotation artifacts.
View details
Entities as Experts: Sparse Memory Access with Entity Supervision
Thibault Févry
Eunsol Choi
EMNLP 2020 - Conference on Empirical Methods in Natural Language Processing (to appear)
Preview abstract
We focus on the problem of capturing declarative knowledge about entities in the learned parameters of a language model. We introduce a new model—Entities as Experts (EAE)— that can access distinct memories of the entities mentioned in a piece of text. Unlike previous efforts to integrate entity knowledge into sequence models, EAE’s entity representations are learned directly from text. We show that EAE’s learned representations capture sufficient knowledge to answer TriviaQA questions such as “Which Dr. Who villain has been played by Roger Delgado, Anthony Ainley, Eric Roberts?”, outperforming an encoder-generator Transformer model with 10× the parameters. According to the LAMA knowledge probes, EAE contains more factual knowledge than a similarly sized BERT, as well as previous approaches that integrate external sources of entity knowledge. Because EAE associates parameters with specific entities, it only needs to access a fraction of its parameters at inference time, and we show that the correct identification and representation of entities is essential to EAE’s performance.
View details
Empirical Evaluation of Pretraining Strategies for Supervised Entity Linking
Thibault Févry
AKBC 2020 - Automated Knowledge Base Construction
Preview abstract
In this work, we present an entity linking model which combines a Transformer architecture with large scale pretraining from Wikipedia links. Our model achieves the state-of-the-art on two commonly used entity linking datasets: 96.7% on CoNLL and 94.9% on TAC-KBP. We present detailed analyses to understand what design choices are important for entity linking, including choices of negative entity candidates, Transformer architecture, and input perturbations. Lastly, we present promising results on more challenging settings such as end-to-end entity linking and entity linking without in-domain training data.
View details
Learning Cross-Context Entity Representations from Text
Jeffrey Ling
Zifei Shan
Thibault Févry
arXiv (2020)
Preview abstract
Language modeling tasks, in which words, or word-pieces, are predicted on the basis of a local context, have been very effective for learning word embeddings and context dependent representations of phrases. Motivated by the observation that efforts to code world knowledge into machine readable knowledge bases or human readable encyclopedias tend to be entity-centric, we investigate the use of a fill-in-the-blank task to learn context independent representations of entities from the text contexts in which those entities were mentioned. We show that large scale training of neural models allows us to learn high quality entity representations, and we demonstrate successful results on four domains: (1) existing entity-level typing benchmarks, including a 64% error reduction over previous work on TypeNet (Murty et al., 2018); (2) a novel few-shot category reconstruction task; (3) existing entity linking benchmarks, where we match the state-of-the-art on CoNLL-Aida without linking-specific features and obtain a score of 89.8% on TAC-KBP 2010 without using any alias table, external knowledge base or in domain training data and (4) answering trivia questions, which uniquely identify entities. Our global entity representations encode fine-grained type categories, such as Scottish footballers, and can answer trivia questions such as: Who was the last inmate of Spandau jail in Berlin?
View details