Isaac R Caswell

Connecting Language Technologies with Rich, Diverse Data Sources Covering Thousands of Languages

Sebastian Ruder

Julia Kreutzer

Clara Rivera

Ishank Saxena

Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024) (to appear)

XTREME-UP: A User-Centric Scarce-Data Benchmark for Under-Represented Languages

Sebastian Ruder

Jon Clark

Alexander Gutkin

Mihir Sanjay Kale

Min Ma

Massimo Nicosia

Shruti Rijhwani

Parker Riley

Jean-Michel Sarr

Cindy Wang

John Wieting

Nitish Gupta

Anna Katanova

Christo Kirov

Dana L. Dickinson

Brian Roark

Bidisha Samanta

Connie Tao

David Adelani

Vera Axelrod

Isaac Caswell

Colin Cherry

Dan Garrette

Reeve Ingle

Melvin Johnson

Dmitry Panteleev

Partha Talukdar

Findings of the Association for Computational Linguistics: EMNLP 2023, Association for Computational Linguistics, Singapore, pp. 1856-1884

BiLex Rx: Lexical Data Augmentation for Massively Multilingual Machine Translation

Alex Jones

Isaac Caswell

Orhan Firat

ArXiv (2023)

Quality at a Glance: An Audit of Web-Crawled Multilingual Datasets

Julia Kreutzer

Isaac Caswell

Lisa Wang

Ahsan Wahab

Daan van Esch

Nasanbayar Ulzii-Orshikh

Allahsera Auguste Tapo

Nishant Subramani

Artem Sokolov

Claytone Sikasote

Monang Setyawan

Supheakmungkol Sarin

Sokhar Samb

Benoît Sagot

Clara E. Rivera

Annette Rios

Isabel Papadimitriou

Salomey Osei

Pedro Javier Ortiz Suárez

Iroro Fred Ọ̀nọ̀mẹ̀ Orife

Kelechi Ogueji

Rubungo Andre Niyongabo

Toan Nguyen

Mathias Müller

André Müller

Shamsuddeen Hassan Muhammad

Nanda Muhammad

Ayanda Mnyakeni

Jamshidbek Mirzakhalov

Tapiwanashe Matangira

Colin Leong

Nze Lawson

Sneha Kudugunta

Yacine Jernite

Mathias Jenny

Orhan Firat

Bonaventure F. P. Dossou

Sakhile Dlamini

Nisansa de Silva

Sakine Çabuk Ballı

Stella Biderman

Alessia Battisti

Ahmed Baruwa

Ankur Bapna

Pallavi Baljekar

Israel Abebe Azime

Ayodele Awokoya

Duygu Ataman

Orevaoghene Ahia

Oghenefego Ahia

Sweta Agrawal

Mofetoluwa Adeyemi

TACL (2022)

Building Machine Translation Systems for the Next Thousand Languages

Ankur Bapna

Isaac Caswell

Julia Kreutzer

Orhan Firat

Daan van Esch

Aditya Siddhant

Mengmeng Niu

Pallavi Nikhil Baljekar

Xavier Garcia

Wolfgang Macherey

Theresa Breiner

Vera Saldinger Axelrod

Jason Riesa

Yuan Cao

Mia Chen

Klaus Macherey

Maxim Krikun

Pidong Wang

Alexander Gutkin

Apu Shah

Yanping Huang

Zhifeng Chen

Yonghui Wu

Macduff Richard Hughes

Google Research (2022)

Writing System and Speaker Metadata for 2,800+ Language Varieties

Daan van Esch

Tamar Lucassen

Sebastian Ruder

Isaac Caswell

Clara E. Rivera

Proceedings of the Language Resources and Evaluation Conference, European Language Resources Association, Marseille, France (2022), pp. 5035-5046

Learning a Multi-Domain Curriculum for Neural Machine Translation

Wei Wang

Ye Tian

Jiquan Ngiam

Yinfei Yang

Isaac Caswell

Zarana Parekh

ACL 2020

Translationese as a Language in “Multilingual” NMT

Parker Riley

Isaac Caswell

Markus Freitag

David Grangier

Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, Association for Computational Linguistics, Online (2020), pp. 7737-7746

Language ID in the Wild: Unexpected Challenges on the Path to a Thousand-Language Web Text Corpus

Isaac Caswell

Theresa Breiner

Daan van Esch

Ankur Bapna

COLING (2020)

BLEU might be Guilty but References are not Innocent

Markus Freitag

David Grangier

Isaac Caswell

Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), Association for Computational Linguistics, pp. 61-71

Tagged Back-Translation

Isaac Caswell

Ciprian Chelba

David Grangier

ACL (2019)

Dynamically Composing Domain-Data Selection with Clean-Data Selection by "Co-Curricular Learning" for Neural Machine Translation

Wei Wang

Isaac Caswell

Ciprian Chelba

The 57th Annual Meeting of the Association for Computational Linguistics (ACL2019)

APE at Scale and its Implications on MT Evaluation Biases

Markus Freitag

Isaac Caswell

Scott Roy

Proceedings of the Fourth Conference on Machine Translation (Volume 1: Research Papers), Association for Computational Linguistics, Florence, Italy (2019), pp. 34-44

Investigating Multilingual NMT Representations at Scale

Sneha Reddy Kudugunta

Ankur Bapna

Isaac Caswell

Naveen Arivazhagan

Orhan Firat

EMNLP (2019)

No Results Found

Defining the technology of today and tomorrow.

Philosophy

People

Teams

AI/ML Foundations  & Capabilities

Algorithms & Optimization

Computing Paradigms

Responsible Human-Centric Technology

Science & Societal Impact

Projects

Publications

Resources

Shaping the future, together.

Student programs

Faculty programs

Conferences & events

Isaac R Caswell

Research Areas

Join us

Defining the technology of today and tomorrow.

Philosophy

People

Teams

AI/ML Foundations & Capabilities

Algorithms & Optimization

Computing Paradigms

Responsible Human-Centric Technology

Science & Societal Impact

Projects

Publications

Resources

Shaping the future, together.

Student programs

Faculty programs

Conferences & events

Isaac R Caswell

Research Areas

Filter by:

Year

Research Area

Team

Join us

AI/ML Foundations  & Capabilities