Colin Cherry

Colin Cherry

Colin Cherry works in Machine Translation and Natural Language Processing. He's interested in creating learning algorithms that discover the underlying structures behind words and sentences, and making use of that structure to generate language. Previously, he was a Research Officer in Text Analytics at National Research Council Canada, and a Researcher in the Natural Language Processing group at Microsoft Research. He received his PhD from the University of Alberta.

He is a proud member of the Association for Computational Linguistics (ACL). He is currently serving as chair of the North American chapter of the ACL (NAACL), and as an action editor of the Transactions of the ACL (TACL).

More details at his personal homepage.

Authored Publications
Sort By
  • Title
  • Title, descending
  • Year
  • Year, descending
    Prompting PaLM for Translation: Assessing Strategies and Performance
    Jiaming Luo
    Viresh Ratnakar
    George Foster
    Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Association for Computational Linguistics, Toronto, Canada (2023), 15406–15427
    Preview abstract Large language models (LLMs) that have been trained on multilingual but not parallel text exhibit a remarkable ability to translate between languages. We probe this ability in an in-depth study of the pathways language model (PaLM), which has demonstrated the strongest machine translation (MT) performance among similarly-trained LLMs to date. We investigate various strategies for choosing translation examples for few-shot prompting, concluding that example quality is the most important factor. Using optimized prompts, we revisit previous assessments of PaLM’s MT capabilities with more recent test sets, modern MT metrics, and human evaluation, and find that its performance, while impressive, still lags that of state-of-the-art supervised systems. We conclude by providing an analysis of PaLM’s MT output which reveals some interesting properties and prospects for future work. View details
    XTREME-UP: A User-Centric Scarce-Data Benchmark for Under-Represented Languages
    Sebastian Ruder
    Mihir Sanjay Kale
    Shruti Rijhwani
    Jean-Michel Sarr
    Cindy Wang
    John Wieting
    Christo Kirov
    Dana L. Dickinson
    Bidisha Samanta
    Connie Tao
    David Adelani
    Reeve Ingle
    Dmitry Panteleev
    Findings of the Association for Computational Linguistics: EMNLP 2023, Association for Computational Linguistics, Singapore, pp. 1856-1884
    Preview abstract Data scarcity is a crucial issue for the development of highly multilingual NLP systems. Yet for many under-represented languages (ULs) — languages for which NLP research is particularly far behind in meeting user needs — it is feasible to annotate small amounts of data. Motivated by this, we propose XTREME-UP, a benchmark defined by: its focus on the scarce-data scenario rather than zero-shot; its focus on user-centric tasks — tasks with broad adoption by speakers of high-resource languages; and its focus on under-represented languages where this scarce-data scenario tends to be most realistic. XTREME-UP evaluates the capabilities of language models across 88 under-represented languages over 9 key user-centric technologies including ASR, OCR, MT, and information access tasks that are of general utility. We create new datasets for OCR, autocomplete, semantic parsing, and transliteration, and build on and refine existing datasets for other tasks. XTREME-UP provides methodology for evaluating many modeling scenarios including text only, multi-modal (vision, audio, and text), supervised parameter tuning, and in-context learning. We evaluate commonly used models on the benchmark. We release all code and scripts to train and evaluate models. View details
    A Natural Diet: Towards Improving Naturalness of Machine Translation Output
    David Grangier
    George Foster
    Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics, Association for Computational Linguistics, Online (2022)
    Preview abstract Machine translation (MT) evaluation often focuses on accuracy and fluency, without paying much attention to translation style. This means that, even when considered accurate and fluent, MT output can still sound less natural than high quality human translations or text originally written in the target language. Machine translation output notably exhibits lower lexical diversity, and employs constructs that mirror those in the source sentence. In this work we propose a method for training MT systems to achieve a more natural style, i.e. mirroring the style of text originally written in the target language. Our method tags parallel training data according to the naturalness of the target side by contrasting language models trained on natural and translated data. Tagging data allows us to put greater emphasis on target sentences originally written in the target language. Automatic metrics show that the resulting models achieve lexical richness on par with human translations, mimicking a style much closer to sentences originally written in the target language. Furthermore, we find that their output is preferred by human experts when compared to the baseline translations. View details
    Preview abstract Reference-free evaluation has the potential to make machine translation evaluation substantially more scalable, allowing us to pivot easily to new languages or domains. It has been recently shown that the probabilities given by a large, multilingual model can achieve state of the art results when used as a reference-free metric. We experiment with various modifications to this model, and demonstrate that by scaling it up we can match the performance of BLEU. We analyze various potential weaknesses of the approach, and find that it is surprisingly robust and likely to offer reasonable performance across a broad spectrum of domains and different system qualities. View details
    Human-Paraphrased References Improve Neural Machine Translation
    George Foster
    David Grangier
    Proceedings of the Fifth Conference on Machine Translation (Volume 1: Research Papers) (2020)
    Preview abstract Automatic evaluation comparing candidate translations to human-generated paraphrases of reference translations has recently been proposed by Freitag et al (2020). When used in place of original references, the paraphrased versions produce metric scores that correlate better with human judgment. This effect holds for a variety of different automatic metrics, and tends to favor natural formulations over more literal (translationese) ones. In this paper we compare the results of performing end-to-end system development using standard and paraphrased references. With state-of-the-art English-German NMT components, we show that tuning to paraphrased references produces a system that is significantly better according to human judgment, but 5 BLEU points worse when tested on standard references. Our work confirms the finding that paraphrased references yield metric scores that correlate better with human judgment, and demonstrates for the first time that using these scores for system development can lead to significant improvements. View details
    Shaping the Narrative Arc: Information-Theoretic Collaborative Dialogue
    George Foster
    Marc G. Bellemare
    International Conference on Computational Creativity (2020)
    Preview abstract We consider the challenge of designing an artificial agent capable of interacting with humans in collaborative dialogue to produce creative, engaging narratives. Collaborative dialogue is distinct from chit-chat in that it is knowledge building, each utterance provides just enough information to add specificity and reduce ambiguity without limiting the conversation. We use concepts from information theory to define a narrative arc function which models dialogue progression. We demonstrate that this function can be used to modulate a generative conversation model and make it produce more interesting dialogues, compared to baseline outputs. We focus on two antithetical modes of modulation: reveal and conceal. Empirically, we show how the narrative arc function can model existing dialogues and shape conversation models towards either mode. We conclude with quantitative evidence suggesting that these modulated models provide interesting and engaging dialogue partners for improvisational theatre performers. View details
    Re-translation versus Streaming for Simultaneous Translation
    Naveen Ari
    George Foster
    IWSLT 2020, Association for Computational Linguistics
    Preview abstract There has been great progress in improving streaming machine translation, a simultaneous paradigm where the system appends to a growing hypothesis as more source content becomes available. We study a related problem in which revisions to the hypothesis beyond strictly appending words are permitted. This is suitable for applications such as live captioning an audio feed. In this setting, we compare custom streaming approaches to re-translation, a straightforward strategy where each new source token triggers a distinct translation from scratch. We find re-translation to be as good or better than state-of-the-art streaming systems, even when operating under constraints that allow very few revisions. We attribute much of this success to a previously proposed data-augmentation technique that adds prefix-pairs to the training data, which alongside wait-k inference forms a strong baseline for streaming translation. We also highlight re-translation's ability to wrap arbitrarily powerful MT systems with an experiment showing large improvements from an upgrade to its base model. View details
    Preview abstract We investigate the problem of simultaneous machine translation of long-form speech content. We target a continuous speech-to-text scenario, generating translated captions for a live audio feed, such as a lecture or play-by-play commentary. As this scenario allows for revisions to our incremental translations, we adopt a re-translation approach to simultaneous translation, where the source is repeatedly translated from scratch as it grows. This approach naturally exhibits very low latency and high final quality, but at the cost of incremental instability as the output is continuously refined. We experiment with a pipeline of industry-grade speech recognition and translation tools, augmented with simple inference heuristics to improve stability. We use TED Talks as a source of multilingual test data, developing our techniques on English-to-German spoken language translation. Our minimalist approach to simultaneous translation allows us to easily scale our final evaluation to six more target languages, dramatically improving incremental stability for all of them. View details
    Inference Strategies for Machine Translation with Conditional Masking
    Julia Kreutzer
    George Foster
    Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP) (to appear)
    Preview abstract Conditional masked language model (CMLM) training has proven successful for non-autoregressive and semi-autoregressive sequence generation tasks, such as machine translation. Given a trained CMLM, however, it is not clear what the best inference strategy is. We formulate masked inference as a factorization of conditional probabilities of partial sequences, show that this does not harm performance, and investigate a number of simple heuristics motivated by this perspective. We identify a thresholding strategy that has advantages over the standard "mask-predict" algorithm, and provide analyses of its behavior on machine translation tasks. View details
    Monotonic Infinite Lookback Attention for Simultaneous Machine Translation
    Naveen Ari
    Chung-Cheng Chiu
    Semih Yavuz
    Ruoming Pang
    Wei Li
    Colin Raffel
    Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics (ACL), Association for Computational Linguistics, Florence, Italy (2019), pp. 1313-1323
    Preview abstract Simultaneous machine translation begins to translate each source sentence before the source speaker is finished speaking, with applications to live and streaming scenarios. Simultaneous systems must carefully schedule their reading of the source sentence to balance quality against latency. We present the first simultaneous translation system to learn an adaptive schedule jointly with a neural machine translation (NMT) model that attends over all source tokens read thus far. We do so by introducing Monotonic Infinite Lookback (MILk) attention, which maintains both a hard,monotonic attention head to schedule the read-ing of the source sentence, and a soft attention head that extends from the monotonic head back to the beginning of the source. We show that MILk’s adaptive schedule allows it to arrive at latency-quality trade-offs that are favorable to those of a recently proposed wait-k strategy for many latency values. View details