Dan Gillick
Research Areas
Authored Publications
Sort By
NAIL: Lexical Retrieval Indices with Efficient Non-Autoregressive Decoders
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, EMNLP 2023 (to appear)
Preview abstract
Neural document rerankers are extremely effective in terms of accuracy. However, the best models require dedicated hardware for serving, which is costly and often not feasible. To avoid this serving-time requirement, we present a method of capturing up to 86% of the gains of a Transformer cross-attention model with a lexicalized scoring function that only requires 10-6% of the Transformer's FLOPs per document and can be served using commodity CPUs. When combined with a BM25 retriever, this approach matches the quality of a state-of-the art dual encoder retriever, that still requires an accelerator for query encoding. We introduce NAIL (Non-Autoregressive Indexing with Language models) as a model architecture that is compatible with recent encoder-decoder and decoder-only large language models, such as T5, GPT-3 and PaLM. This model architecture can leverage existing pre-trained checkpoints and can be fine-tuned for efficiently constructing document representations that do not require neural processing of queries.
View details
Adapting Language Models to Temporal Knowledge
Bhuwan Dhingra
Transactions of the ACL (2021)
Preview abstract
It is only a matter of time before facts become out of date: from the name of \abr{POTUS} to the basketball team Lebron James plays for.
This continuously limits the usefulness of previously collected datasets and language models (LMs) trained on them.
This problem is exacerbated as LMs are used in the closed book question answering setting,
where the pretraining data must contain the facts for the model to remember within its fixed parameters.
A frequent paradigm is to update or refresh the dataset every so often, then retrain models with the new data: this is costly, but does it work?
In this paper, we introduce a diagnostic dataset for probing LMs for factual knowledge that changes over time.
Using it we show that models trained only on the most recent slice of data perform
worse on questions about the past than models trained on uniform data across time,
while being better on current and future questions.
Moreover, we propose jointly modeling text with the time it was created
and show that this improves memorization of previous facts,
as well as reasoning about the uncertainty around future facts.
We also show that models trained with temporal context allow for efficient refreshes as
new data arrives without the need of retraining from scratch.
View details
MOLEMAN: Mention-Only Linking of Entities with a Mention Annotation Network
Jan A. Botha
Dan Bikel
Andrew McCallum
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 2: Short Papers), Association for Computational Linguistics, Online (2021), pp. 278-285
Preview abstract
We present an instance-based nearest neighbor approach to entity linking. In contrast to most prior entity retrieval systems which represent each entity with a single vector, we build a contextualized mention-encoder that learns to place similar mentions of the same entity closer in vector space than mentions of different entities. This approach allows all mentions of an entity to serve as "class prototypes" as inference involves retrieving from the full set of labeled entity mentions in the training set and applying the nearest mention neighbor’s entity label. Our model is trained on a large multilingual corpus of mention pairs derived from Wikipedia hyperlinks, and performs nearest neighbor inference on an index of 700 million mentions. It is simpler to train, gives more interpretable predictions, and outperforms all other systems on two multilingual entity linking benchmarks.
View details
Entity Linking in 100 Languages
Jan A. Botha
Zifei Shan
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), Association for Computational Linguistics, pp. 7833-7845
Preview abstract
We propose a new formulation for multilingual entity linking, where language-specific mentions resolve to a language-agnostic Knowledge Base. We train a dual encoder in this new setting, building on prior work with improved feature representation, negative mining, and an auxiliary entity-pairing task, to obtain a single entity retrieval model that covers 100+ languages and 20 million entities. The model outperforms state-of-the-art results from a far more limited cross-lingual linking task. Rare entities and low-resource languages pose challenges at this large-scale, so we advocate for an increased focus on zero- and few-shot evaluation. To this end, we provide Mewsli-9, a large new multilingual dataset matched to our setting, and show how frequency-based analysis provided key insights for our model and training enhancements.
View details
Learning Dense Representations for Entity Retrieval
Larry Lansing
Diego Garcia-Olano
CoNLL (2019)
Preview abstract
We show that it is feasible to perform entity linking by training a dual encoder (two-tower) model that encodes mentions and entities in the same dense vector space, where candidate entities are retrieved by approximate nearest neighbor search. Unlike prior work, this setup does not rely on an alias table followed by a re-ranker, and is thus the first fully learned entity retrieval model. We show that our dual encoder, trained using only anchor-text links in Wikipedia, outperforms discrete alias table and BM25 baselines, and is competitive with the best comparable results on the standard TACKBP-2010 dataset. In addition, it can retrieve candidates extremely fast, and generalizes well to a new dataset derived from Wikinews. On the modeling side, we demonstrate the dramatic value of an unsupervised negative mining algorithm for this task.
View details
Preview abstract
We address the problem of fine-grained multilingual language identification: providing a language code for every token in a sentence, including codemixed text containing multiple languages. Such text is increasingly prevalent online, in documents, social media, and message boards. In this paper, we show that a feed-forward network with a simple globally constrained decoder can accurately and rapidly label both codemixed and monolingual text in 100 languages and 100 language pairs. This model outperforms previously published multilingual approaches in terms of both accuracy and speed, yielding an 800x speed-up and a 19.2% averaged absolute gain on three codemixed datasets.
View details
Preview abstract
We describe an LSTM-based model which we call Byte-to-Span (BTS) that reads text as bytes and outputs span annotations of the form [start, length, label] where start positions, lengths, and labels are separate entries in our vocabulary. Because we operate directly on unicode bytes rather than language-specific words or characters, we can analyze text in many languages with a single model. Due to the small vocabulary size, these multilingual models are very compact, but produce results similar to or better than the state-of-the-art in Part-of-Speech tagging and Named Entity Recognition that use only the provided training datasets (no external data sources). Our models are learning “from scratch” in that they do not rely on any elements of the standard pipeline in Natural Language Processing (including tokenization), and thus can run in standalone fashion on raw text.
View details
Exploring the Steps of Verb Phrase Ellipsis
Zhengzhong Liu
Edgar Gonzàlez
Workshop on Coreference Resolution Beyond OntoNotes at NAACL 2016
Preview abstract
Verb Phrase Ellipsis is a well-studied topic in theoretical linguistics but has received little attention as a computational problem. Here we propose a decomposition of the overall resolution problem into three tasks—target detection, antecedent head resolution, and antecedent boundary detection—and implement a number of computational approaches for each one. We also explore the relationships among these tasks by attempting joint learning over different combinations. Our new decomposition of the problem yields significantly improved performance on publicly available datasets, including a newly contributed one.
View details
Preview abstract
We propose a new approach to the task of fine grained entity type classifications based on label embeddings that allows for information sharing among related labels. Specifically, we learn an embedding for each label and each feature such that labels which frequently co-occur are close in
the embedded space. We show that it outperforms state-of-the-art methods on two fine grained entity-classification benchmarks and that the model can exploit the finer-grained labels to improve classification of standard coarse types.
View details
Preview abstract
Entity type tagging is the task of assigning category labels to each mention of an entity in a document. While standard systems focus on a small set of types, recent work (Ling and Weld, 2012) suggests that using a large fine-grained label set can lead to dramatic improvements in downstream tasks. In the absence of labeled training data, existing fine-grained tagging systems obtain examples automatically, using resolved entities and their types extracted from a knowledge base. However, since the appropriate type often depends on context (e.g. Washington could be tagged either as city or government), this procedure can result in spurious labels, leading to poorer generalization. We propose the task of context-dependent fine type tagging, where the set of acceptable labels for a mention is restricted to only those deducible from the local context (e.g. sentence or document). We introduce new resources for this task: 11,304 mentions annotated with their context-dependent fine types, and we provide baseline experimental results on this data.
View details