Jump to Content
Dan Gillick

Dan Gillick

Authored Publications
Google Publications
Other Publications
Sort By
  • Title
  • Title, descending
  • Year
  • Year, descending
    NAIL: Lexical Retrieval Indices with Efficient Non-Autoregressive Decoders
    Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, EMNLP 2023 (to appear)
    Preview abstract Neural document rerankers are extremely effective in terms of accuracy. However, the best models require dedicated hardware for serving, which is costly and often not feasible. To avoid this serving-time requirement, we present a method of capturing up to 86% of the gains of a Transformer cross-attention model with a lexicalized scoring function that only requires 10-6% of the Transformer's FLOPs per document and can be served using commodity CPUs. When combined with a BM25 retriever, this approach matches the quality of a state-of-the art dual encoder retriever, that still requires an accelerator for query encoding. We introduce NAIL (Non-Autoregressive Indexing with Language models) as a model architecture that is compatible with recent encoder-decoder and decoder-only large language models, such as T5, GPT-3 and PaLM. This model architecture can leverage existing pre-trained checkpoints and can be fine-tuned for efficiently constructing document representations that do not require neural processing of queries. View details
    Preview abstract It is only a matter of time before facts become out of date: from the name of \abr{POTUS} to the basketball team Lebron James plays for. This continuously limits the usefulness of previously collected datasets and language models (LMs) trained on them. This problem is exacerbated as LMs are used in the closed book question answering setting, where the pretraining data must contain the facts for the model to remember within its fixed parameters. A frequent paradigm is to update or refresh the dataset every so often, then retrain models with the new data: this is costly, but does it work? In this paper, we introduce a diagnostic dataset for probing LMs for factual knowledge that changes over time. Using it we show that models trained only on the most recent slice of data perform worse on questions about the past than models trained on uniform data across time, while being better on current and future questions. Moreover, we propose jointly modeling text with the time it was created and show that this improves memorization of previous facts, as well as reasoning about the uncertainty around future facts. We also show that models trained with temporal context allow for efficient refreshes as new data arrives without the need of retraining from scratch. View details
    MOLEMAN: Mention-Only Linking of Entities with a Mention Annotation Network
    Jan A. Botha
    Dan Bikel
    Andrew McCallum
    Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 2: Short Papers), Association for Computational Linguistics, Online (2021), pp. 278-285
    Preview abstract We present an instance-based nearest neighbor approach to entity linking. In contrast to most prior entity retrieval systems which represent each entity with a single vector, we build a contextualized mention-encoder that learns to place similar mentions of the same entity closer in vector space than mentions of different entities. This approach allows all mentions of an entity to serve as "class prototypes" as inference involves retrieving from the full set of labeled entity mentions in the training set and applying the nearest mention neighbor’s entity label. Our model is trained on a large multilingual corpus of mention pairs derived from Wikipedia hyperlinks, and performs nearest neighbor inference on an index of 700 million mentions. It is simpler to train, gives more interpretable predictions, and outperforms all other systems on two multilingual entity linking benchmarks. View details
    Entity Linking in 100 Languages
    Jan A. Botha
    Zifei Shan
    Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), Association for Computational Linguistics, pp. 7833-7845
    Preview abstract We propose a new formulation for multilingual entity linking, where language-specific mentions resolve to a language-agnostic Knowledge Base. We train a dual encoder in this new setting, building on prior work with improved feature representation, negative mining, and an auxiliary entity-pairing task, to obtain a single entity retrieval model that covers 100+ languages and 20 million entities. The model outperforms state-of-the-art results from a far more limited cross-lingual linking task. Rare entities and low-resource languages pose challenges at this large-scale, so we advocate for an increased focus on zero- and few-shot evaluation. To this end, we provide Mewsli-9, a large new multilingual dataset matched to our setting, and show how frequency-based analysis provided key insights for our model and training enhancements. View details
    Preview abstract We show that it is feasible to perform entity linking by training a dual encoder (two-tower) model that encodes mentions and entities in the same dense vector space, where candidate entities are retrieved by approximate nearest neighbor search. Unlike prior work, this setup does not rely on an alias table followed by a re-ranker, and is thus the first fully learned entity retrieval model. We show that our dual encoder, trained using only anchor-text links in Wikipedia, outperforms discrete alias table and BM25 baselines, and is competitive with the best comparable results on the standard TACKBP-2010 dataset. In addition, it can retrieve candidates extremely fast, and generalizes well to a new dataset derived from Wikinews. On the modeling side, we demonstrate the dramatic value of an unsupervised negative mining algorithm for this task. View details
    Preview abstract We address the problem of fine-grained multilingual language identification: providing a language code for every token in a sentence, including codemixed text containing multiple languages. Such text is increasingly prevalent online, in documents, social media, and message boards. In this paper, we show that a feed-forward network with a simple globally constrained decoder can accurately and rapidly label both codemixed and monolingual text in 100 languages and 100 language pairs. This model outperforms previously published multilingual approaches in terms of both accuracy and speed, yielding an 800x speed-up and a 19.2% averaged absolute gain on three codemixed datasets. View details
    Exploring the Steps of Verb Phrase Ellipsis
    Zhengzhong Liu
    Edgar Gonzàlez
    Workshop on Coreference Resolution Beyond OntoNotes at NAACL 2016
    Preview abstract Verb Phrase Ellipsis is a well-studied topic in theoretical linguistics but has received little attention as a computational problem. Here we propose a decomposition of the overall resolution problem into three tasks—target detection, antecedent head resolution, and antecedent boundary detection—and implement a number of computational approaches for each one. We also explore the relationships among these tasks by attempting joint learning over different combinations. Our new decomposition of the problem yields significantly improved performance on publicly available datasets, including a newly contributed one. View details
    Preview abstract We describe an LSTM-based model which we call Byte-to-Span (BTS) that reads text as bytes and outputs span annotations of the form [start, length, label] where start positions, lengths, and labels are separate entries in our vocabulary. Because we operate directly on unicode bytes rather than language-specific words or characters, we can analyze text in many languages with a single model. Due to the small vocabulary size, these multilingual models are very compact, but produce results similar to or better than the state-of-the-art in Part-of-Speech tagging and Named Entity Recognition that use only the provided training datasets (no external data sources). Our models are learning “from scratch” in that they do not rely on any elements of the standard pipeline in Natural Language Processing (including tokenization), and thus can run in standalone fashion on raw text. View details
    Preview abstract We propose a new approach to the task of fine grained entity type classifications based on label embeddings that allows for information sharing among related labels. Specifically, we learn an embedding for each label and each feature such that labels which frequently co-occur are close in the embedded space. We show that it outperforms state-of-the-art methods on two fine grained entity-classification benchmarks and that the model can exploit the finer-grained labels to improve classification of standard coarse types. View details
    A New Entity Salience Task with Millions of Training Examples
    Jesse Dunietz
    Proceedings of the European Association for Computational Linguistics, Association for Computational Linguistics (2014)
    Preview abstract Although many NLP systems are moving toward entity-based processing, most still identify important phrases using classical keyword-based approaches. To bridge this gap, we introduce the task of entity salience: assigning a relevance score to each entity in a document. We demonstrate how a labeled corpus for the task can be automatically generated from a corpus of documents and accompanying abstracts. We then show how a classifier with features derived from a standard NLP pipeline outperforms a strong baseline by 34%. Finally, we outline initial experiments on further improving accuracy by leveraging background knowledge about the relationships between entities. View details
    Preview abstract Entity type tagging is the task of assigning category labels to each mention of an entity in a document. While standard systems focus on a small set of types, recent work (Ling and Weld, 2012) suggests that using a large fine-grained label set can lead to dramatic improvements in downstream tasks. In the absence of labeled training data, existing fine-grained tagging systems obtain examples automatically, using resolved entities and their types extracted from a knowledge base. However, since the appropriate type often depends on context (e.g. Washington could be tagged either as city or government), this procedure can result in spurious labels, leading to poorer generalization. We propose the task of context-dependent fine type tagging, where the set of acceptable labels for a mention is restricted to only those deducible from the local context (e.g. sentence or document). We introduce new resources for this task: 11,304 mentions annotated with their context-dependent fine types, and we provide baseline experimental results on this data. View details
    No Results Found