Jump to Content

Publications

Publishing our work allows us to share ideas and work collaboratively to advance the field of computer science.

People looking at a screen

Publications

Publishing our work allows us to share ideas and work collaboratively to advance the field of computer science.

Sort By
  • Title
  • Title, desc
  • Year
  • Year, desc
1 - 15 of 1068 publications
    Conformal Language Modeling
    Victor Quach
    Adam Fisch
    Adam Yala
    Jae Ho Sohn
    Tommi Jaakkola
    Regina Barzilay
    ICLR (2024) (to appear)
    Preview abstract In this paper, we propose a novel approach to conformal prediction (CP) that is adapted to generative, large language models (LLMs). Conformal prediction is a popular technique for deriving prediction sets from machine learning models that have rigorous, statistical performance guarantees. We extend conformal techniques to a broad class of language models that sample from a conditional distribution over the combinatorial, unbounded space of possible text outputs, given some input prompt. Specifically, we translate the process of constructing prediction sets into calibrating a \emph{stopping rule}, under which we draw diverse samples from our model until we are confident that the growing set of candidate answers includes at least one high-quality response. At the same time, we calibrate a \emph{rejection rule} to selectively discard low-quality or redundant responses to reduce sample noise. Under minimal assumptions, we theoretically prove that our resulting output sets contain at least one high-quality answer with some desired probability that a user can set (such as $90\%$), while still remaining empirically precise on average. Furthermore, within this set of sampled candidate answers, we show that we can also accurately identify subsets of individual components (e.g., phrases or sentences) that are each independently correct (e.g., that are not ``hallucinations'')---again, with provably high probability. We demonstrate the effectiveness of our approach on multiple types of large language models applied to tasks in open-domain question answering, text summarization, and radiology report generation. View details
    Preview abstract Embeddings have become a pivotal means to represent complex, multi-faceted information about entities, concepts, and relationships in a condensed and useful format. Nevertheless, they often preclude direct interpretation. While downstream tasks make use of these compressed representations, meaningful interpretation usually requires visualization using dimensionality reduction or specialized machine learning interpretability methods. This paper addresses the challenge of making such embeddings more interpretable and broadly useful, by employing large language models (LLMs) to directly interact with embeddings -- transforming abstract vectors into understandable narratives. By injecting embeddings into LLMs, we enable querying and exploration of complex embedding data. We demonstrate our approach on a variety of diverse tasks, including: enhancing concept activation vectors (CAVs), communicating novel embedded entities, and decoding user preferences in recommender systems. Our work couples the immense information potential of embeddings with the interpretative power of LLMs. View details
    Preview abstract As instruction-tuned large language models (LLMs) gain global adoption, their ability to follow instructions in multiple languages becomes increasingly crucial. In this work, we investigate how multilinguality during instruction tuning of a multilingual LLM affects instruction-following across languages from the pre-training corpus. We first show that many languages transfer some instruction-following capabilities to other languages from even monolingual tuning. Furthermore, we find that only 40 multilingual examples integrated in an English tuning set substantially improve multilingual instruction-following, both in seen and unseen languages during tuning. In general, we observe that models tuned on multilingual mixtures exhibit comparable or superior performance in multiple languages compared to monolingually tuned models, despite training on 10x fewer examples in those languages. Finally, we find that diversifying the instruction tuning set with even just 2-4 languages significantly improves cross-lingual generalization. Our results suggest that building massively multilingual instruction-tuned models can be done with only a very small set of multilingual instruction-responses. View details
    Preview abstract Facilitated by large language models (LLMs), personalized text generation has become a rapidly growing research direction. Most existing studies focus on designing specialized models for a particular domain, or they require fine-tuning the LLMs to generate personalized text. We consider a typical scenario in which the large language model, which generates personalized output, is frozen and can only be accessed through APIs. Under this constraint, all one can do is to improve the input text (i.e., text prompts) sent to the LLM, a procedure that is usually done manually. In this paper, we propose a novel method to automatically revise prompts for personalized text generation. The proposed method takes the initial prompts generated by a state-of-the-art, multistage framework for personalized generation and rewrites a few critical components that summarize and synthesize the personal context. The prompt rewriter employs a training paradigm that chains together supervised learning (SL) and reinforcement learning (RL), where SL reduces the search space of RL and RL facilitates end-to-end training of the rewriter. Using datasets from three representative domains, we demonstrate that the rewritten prompts outperform both the original prompts and the prompts optimized via supervised learning or reinforcement learning alone. In-depth analysis of the rewritten prompts shows that they are not only human readable, but also able to guide manual revision of prompts when there is limited resource to employ reinforcement learning to train the prompt rewriter, or when it is costly to deploy an automatic prompt rewriter for inference. View details
    Helpful Neighbors: Leveraging Neighbors in Geographic Feature Pronunciation
    Lion Jones
    Haruko Ishikawa
    Transactions of the Association for Computational Linguistics, vol. 11 (2023), 85–101
    Preview abstract If one sees the place name Houston Mercer Dog Run in New York, how does one know how to pronounce it? Assuming one knows that Houston in New York is pronounced ˈhaʊstən and not like the Texas city (ˈhjuːstən), then one can probably guess that ˈhaʊstən is also used in the name of the dog park. We present a novel architecture that learns to use the pronunciations of neighboring names in order to guess the pronunciation of a given target feature. Applied to Japanese place names, we demonstrate the utility of the model to finding and proposing corrections for errors in Google Maps. To demonstrate the utility of this approach to structurally similar problems, we also report on an application to a totally different task: Cognate reflex prediction in comparative historical linguistics. A version of the code has been open-sourced. View details
    Text-Blueprint: An Interactive Platform for Plan-based Conditional Generation
    Fantine Huot
    Reinald Kim Amplayo
    Mirella Lapata
    Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics: System Demonstrations (2023)
    Preview abstract While conditional generation models can now generate natural language well enough to create fluent text, it is still difficult to control the generation process, leading to irrelevant, repetitive, and hallucinated content. Recent work shows that planning can be a useful intermediate step to render conditional generation less opaque and more grounded. We present a web browser-based demonstration for query-focused summarization that uses a sequence of question-answer pairs, as a blueprint plan for guiding text generation (i.e., what to say and in what order). We illustrate how users may interact with the generated text and associated plan visualizations, e.g., by editing and modifying the blueprint in order to improve or control the generated output. View details
    MoQA: Benchmarking Multi-Type Open-Domain Question Answering
    Howard Yen
    Tianyu Gao
    Danqi Chen
    Proceedings of the Third DialDoc Workshop on Document-grounded Dialogue and Conversational Question Answering, Association for Computational Linguistics (2023), 8–29
    Preview abstract Existing open-domain question answering research mainly focuses on questions that can be answered in a few words. However, information-seeking questions often require different formats of answers depending on the nature of questions, e.g., ``Why is there a maple leaf on the Canadian flag?'' In this paper, we present a new task, MOQA, which requires building QA models that can provide short, medium, long, and yes/no answers to open-domain questions simultaneously. We expand the Natural Questions dataset into the open-domain setting by keeping all types of questions and show that existing systems cannot generalize to these new types. We adapt state-of-the-art open-domain QA models---based on retriever-reader and phrase retrieval models---to tackle this task. Results and analyses of our multi-type QA models reveal the unique challenges of the task, calling for versatile QA models in the future. View details
    (QA)^2: Question Answering with Questionable Assumptions
    Phu Mon Htut
    Samuel R. Bowman
    Jackson Petty
    ACL (2023) (to appear)
    Preview abstract Naturally-occurring information-seeking questions often contain questionable assumptions---assumptions that are false or unverifiable. Questions containing questionable assumptions are challenging because they require a distinct answer strategy that deviates from typical answers to information-seeking questions. For instance, the question "When did Marie Curie discover Uranium?" cannot be answered as a typical "when" question without addressing the false assumption "Marie Curie discovered Uranium". In this work, we propose (QA)$^2$ (Question Answering with Questionable Assumptions), an open-domain evaluation dataset consisting of naturally-occurring search engine queries that may or may not contain questionable assumptions. To be successful on (QA)$^2$, systems must be able to detect questionable assumptions and also be able to produce adequate responses for both typical information-seeking questions and ones with questionable assumptions. Through human rater acceptability on abstractive QA with (QA)$^2$ questions, we find that current models do struggle with handling questionable assumptions, leaving substantial headroom for progress. View details
    Preview abstract Formulating selective information needs results in queries that implicitly specify set operations, such as intersection, union, and difference. For instance, one might search for "shorebirds that are not sandpipers" or "science-fiction films shot in England". To study the ability of retrieval systems to meet such information needs, we construct QUEST, a dataset of 3357 natural language queries with implicit set operations, that map to a set of entities corresponding to Wikipedia documents. The dataset challenges models to match multiple constraints mentioned in queries with corresponding evidence in documents and correctly perform various set operations. The dataset is constructed semi-automatically using Wikipedia category names. Queries are automatically composed from individual categories, then paraphrased and further validated for naturalness and fluency by crowdworkers. Crowdworkers also assess the relevance of entities based on their documents and highlight attribution of query constraints to spans of document text. We analyze several modern retrieval systems, finding that they often struggle on such queries. Queries involving negation and conjunction are particularly challenging and systems are further challenged with combinations of these operations. View details
    Joint Adaptive Representations for Image-Language Learning
    Transformers for Vision (T4V) Workshop at the Conference on Computer Vision and Pattern Recognition (CVPR) (2023)
    Preview abstract Image-language transformer models have achieved tremendous success, but they come at high computational costs. We here propose a joint adaptive image-language representation learning, which adaptively and iteratively fuses the multi-modal features. This consistently reduces the model cost and size, allows the model to scale without a large increase in FLOPs or memory, and outperforms bigger and much more expensive models. With only 40M training examples and with 39 GFLOPs our model outperforms many times larger models, some reaching 800 GFLOPs. View details
    Preview abstract Large language models (LLMs) can be used to generate smaller, more refined datasets via few-shot prompting for benchmarking, fine-tuning or other use cases. However, understanding and evaluating these datasets is difficult, and the failure modes of LLM-generated data are still not well understood. Specifically, the data can be repetitive in surprising ways, not only semantically but also syntactically and lexically. We present LinguisticLens, a novel interactive visualization tool for making sense of and analyzing syntactic diversity of LLM-generated datasets. LinguisticLens clusters text along syntactic, lexical, and semantic axes. It supports hierarchical visualization of a text dataset, allowing users to quickly scan for an overview and inspect individual examples. The live demo is available at https://shorturl.at/zHOUV. View details
    Preview abstract Large multilingual language models typically share their parameters across all languages, which enables cross-lingual task transfer, but learning can also be hindered when training updates from different languages are in conflict. In this article, we propose novel methods for using language-specific subnetworks, which control cross-lingual parameter sharing, to reduce conflicts and increase positive transfer during fine-tuning. We introduce dynamic subnetworks, which are jointly updated with the model, and we combine our methods with meta-learning, an established, but complementary, technique for improving cross-lingual transfer. Finally, we provide extensive analyses of how each of our methods affects the models. View details
    Preview abstract We introduce and study the problem of Continual Multilingual Learning (CML), where a previously trained multilingual model is periodically updated using new data arriving in stages. If the new data is present only in a subset of languages, we find that the resulting model shows improved performance only on the languages included in the latest update (and few closely related languages) while its performance on all the remaining languages degrade significantly. We address this challenge by proposing LAFT-URIEL, a parameter-efficient finetuning strategy which aims to increase the number of languages on which the model improves after an update, while reducing the magnitude of loss in performance for the remaining languages. LAFT-URIEL uses linguistic knowledge to balance overfitting and knowledge sharing across languages, thus resulting in 25% increase in the number of languages whose performances improve during an update and 78% relative decrease in average magnitude of losses on the remaining languages. View details
    Preview abstract The ability to recognize and reason about text embedded in visual inputs is often lacking in vision-and-language (V&L) models, perhaps because V&L pre-training methods have often failed to include such an ability in their training objective. In this paper, we propose PreSTU, a novel pre-training recipe dedicated to scene-text understanding (STU). PreSTU introduces OCR-aware pre-training objectives that encourage the model to recognize text from an image and connect it to the rest of the image content. We implement PreSTU using a simple transformer-based encoder-decoder architecture, combined with large-scale image-text datasets with scene text obtained from an off-the-shelf OCR system. We empirically demonstrate the effectiveness of this pre-training approach on eight visual question answering and four image captioning benchmarks. View details
    SMILE: Evaluation and Domain Adaptation for Social Media Language Understanding
    Vasilisa Bashlovkina
    Riley Matthews
    Charles Kwong
    29TH ACM SIGKDD CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING (KDD) (2023)
    Preview abstract We study the ability of transformer-based language models (LMs) to understand social media language. Social media (SM) language is distinct from standard written language, yet existing benchmarks fall short of capturing LM performance in this socially, economically, and politically important domain. We quantify the degree to which social media language differs from conventional language and conclude that the difference is significant both in terms of token distribution and rate of linguistic shift. Next, we introduce a new benchmark for Social MedIa Language Evaluation (SMILE) that covers four SM platforms and eleven tasks. Finally, we show that learning a tokenizer and pretraining on a mix of social media and conventional language yields an LM that outperforms the best similar-sized alternative by 4.2 points on the overall SMILE score. View details