Daniel Cer
Authored Publications
Sort By
Preview abstract
We provide the first exploration of sentence embeddings from text-to-text transformers (T5) including the effects of scaling up sentence encoders to 11B parameters. Sentence embeddings are broadly useful for language processing tasks. While T5 achieves impressive performance on language tasks, it is unclear how to produce sentence embeddings from encoder-decoder models. We investigate three methods to construct Sentence-T5 (ST5) models: two utilize only the T5 encoder and one using the full T5 encoder-decoder. We establish a new sentence representation transfer benchmark, SentGLUE, which extends the SentEval toolkit to nine tasks from the GLUE benchmark. Our encoder-only models outperform the previous best models on both SentEval and SentGLUE transfer tasks, including semantic textual similarity (STS). Scaling up ST5 from millions to billions of parameters shown to consistently improve performance. Finally, our encoder-decoder method achieves a new state-of-the-art on STS when using sentence embeddings.
View details
Overcoming Catastrophic Forgetting in Zero-Shot Cross-Lingual Generation
Tu Vu
Aditya Barua
Mohit Iyyer
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing (EMNLP)
Preview abstract
In this paper, we explore the challenging problem of performing a generative task in a target language when labeled data is only available in English, using summarization as a case study. We assume a strict setting with no access to parallel data or machine translation and find that common transfer learning approaches struggle in this setting, as a generative multilingual model fine-tuned purely on English catastrophically forgets how to generate non-English. Given the recent rise of parameter-efficient adaptation techniques, we conduct the first investigation into how one such method, prompt tuning (Lester et al., 2021), can overcome catastrophic forgetting to enable zero-shot cross-lingual generation. Our experiments show that parameter-efficient prompt tuning provides gains over standard fine-tuning when transferring between less-related languages, e.g., from English to Thai. However, a significant gap still remains between these methods and fully-supervised baselines. To improve cross-lingual transfer further, we explore several approaches, including: (1) mixing in unlabeled multilingual data, and (2) explicitly factoring prompts into recombinable language and task components. Our approaches can provide further quality gains, suggesting that robust zero-shot cross-lingual generation is within reach.
View details
SPoT: Better Frozen Model Adaptation through Soft Prompt Transfer
Tu Vu
Rami Al-Rfou
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics, Association for Computational Linguistics (2022)
Preview abstract
There has been growing interest in parameter-efficient methods to apply pre-trained language models to downstream tasks. Building on the Prompt Tuning approach of Lester et al. (2021), which learns task-specific soft prompts to condition a frozen pre-trained model to perform different tasks, we propose a novel prompt-based transfer learning approach called SPoT: Soft Prompt Transfer. SPoT first learns a prompt on one or more source tasks and then uses it to initialize the prompt for a target task. We show that SPoT significantly boosts the performance of Prompt Tuning across many tasks. More remarkably, across all model sizes, SPoT matches or outperforms standard Model Tuning (which fine-tunes all model parameters) on the SuperGLUE benchmark, while using up to 27,000x fewer task-specific parameters. To understand where SPoT is most effective, we conduct a large-scale study on task transferability with 26 NLP tasks in 160 combinations, and demonstrate that many tasks can benefit each other via prompt transfer. Finally, we propose an efficient retrieval approach that interprets task prompts as task embeddings to identify similar tasks and predict the most transferable source tasks for a novel target task.
View details
Universal Sentence Encoder
Yinfei Yang
Sheng-yi Kong
Nan Hua
Nicole Lyn Untalan Limtiaco
Rhomni St. John
Mario Guajardo-Céspedes
Steve Yuan
Chris Tar
Brian Strope
Ray Kurzweil
In submission to: EMNLP demonstration, Association for Computational Linguistics, Brussels, Belgium (2018)
Preview abstract
We present models for encoding sentences into embedding vectors that specifically target transfer learning to other NLP tasks. The models are efficient and result in accurate performance on diverse transfer tasks. Two variants of the encoding models allow for trade-offs between accuracy and compute resources. For both variants, we investigate and report the relationship between model complexity, resource consumption, the availability of transfer task training data, and task performance. Comparisons are made with baselines that use word level transfer learning via pretrained word embeddings as well as baselines do not use any transfer learning. We find that transfer learning using sentence embeddings tends to outperform word level transfer. With transfer learning via sentence embeddings, we observe surprisingly good performance with minimal amounts of supervised training data for a transfer task. We obtain encouraging results on Word Embedding Association Tests (WEAT) targeted at detecting model bias. Our pre-trained sentence encoding models are made freely available for download and on TF Hub.
View details
SemEval-2017 Task 1: Semantic Textual Similarity Multilingual and Crosslingual Focused Evaluation
Mona Diab
Eneko Agirre
Iñigo Lopez-Gazpio
Lucia Specia
Proceedings of the 11th International Workshop on Semantic Evaluation (SemEval-2017), Association for Computational Linguistics, Vancouver, Canada, pp. 1-14
Preview abstract
Semantic Textual Similarity (STS) measures the meaning similarity of sentences. Applications include machine translation (MT), summarization, generation, question answering (QA), short answer grading, semantic search, dialog and conversational systems. The STS shared task is a venue for assessing the current state-of-the-art. The 2017 task focuses on multilingual and cross-lingual pairs with one sub-track exploring MT quality estimation (MTQE) data. The task obtained strong participation from 31 teams, with 17 participating in all language tracks. We summarize performance and review a selection of well performing methods. Analysis highlights common errors, providing insight into the limitations of existing models. To support ongoing work on semantic representations, the STS Benchmark is introduced as a new shared training and evaluation set carefully selected from the corpus of English STS shared task data (2012-2017).
View details
SemEval-2016 Task 1: Semantic Textual Similarity, Monolingual and Cross-Lingual Evaluation
Eneko Agirre
Carmen Banea
Mona Diab
Aitor Gonzalez-Agirre
Rada Mihalcea
German Rigau
Janyce Wiebe
Proceedings of the 10th International Workshop on Semantic Evaluation (SemEval-2016), Association for Computational Linguistics, San Diego, California, pp. 497-511
Preview abstract
Semantic Textual Similarity (STS) is the degree of semantic equivalence between two snippets of text. Similarity is expressed on an ordinal scale that spans from semantic equivalence to the two texts being completely dissimilar to each other with intermediate values capturing specifically defined levels of incompletely overlapping similarity. While prior evaluations constrained themselves to just monolingual snippets of text, the 2016 shared task includes a pilot sub-task on computing semantic similarity on cross-lingual text snippets. This year's traditional monolingual sub-task includes the evaluation of English text snippets from the following four domains: Plagiarism Detection, Post-Edited Machine Translations, Question-Answering, and News Article Headlines. From the question-answering domain we included both question-question and answer-answer pairs. The cross-lingual task provides paired English-Spanish text snippets drawn from the same sources as the monolingual data as well as independently sampled news data. The monolingual task attracted 42 participating teams producing 118 system submissions, while the cross-lingual pilot task attracted 24 teams submitting 26 systems.
View details