Language

We advance the state of the art in natural language technologies and build systems that learn to understand and generate language in context.

We advance the state of the art in natural language technologies and build systems that learn to understand and generate language in context.

About the team

Our team comprises multiple research groups working on a wide range of natural language understanding and generation projects. We pursue long-term research to develop novel capabilities that can address the needs of current and future Google products. We publish frequently and evaluate our methods on established scientific benchmarks (e.g., SQuAD, GLUE, SuperGlue) or develop new ones for measuring progress (e.g., Conceptual Captions, Natural Questions, TyDiQA). We collaborate with other teams across Google to deploy our research to the benefit of our users. Our product contributions often stretch the boundaries of what is technically possible. Applications of our research have resulted in better language capabilities across all major Google products.

Our researchers are experts in natural language processing and machine learning with varied backgrounds and a passion for language. Computer scientists and linguists work hand-in-hand to provide insight into ways to define language tasks, collect valuable data, and assist in enabling internationalization. Researchers and engineers work together to develop new neural network models that are sensitive to the nuances of language while taking advantage of the latest advances in specialized compute hardware (e.g., TPUs) to produce scalable solutions that can be used by billions of users.

Team focus summaries

Language representations

Question answering

Document understanding

Dialogue

Generation

Multilinguality

Language & vision

Translation

Summarization

Classification

Speech and language algorithms

Entities, relations, and reasoning

Grounded language understanding

Semantic parsing

Sentiment analysis

Trustworthiness

Featured publications

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

Jacob Devlin

Ming-Wei Chang

Kenton Lee

Kristina N. Toutanova

NAACL 2019 (2018)

Natural Questions: a Benchmark for Question Answering Research

Tom Kwiatkowski

Jennimaria Palomaki

Olivia Redfield

Michael Collins

Ankur Parikh

Chris Alberti

Danielle Epstein

Illia Polosukhin

Matthew Kelcey

Jacob Devlin

Kenton Lee

Kristina N. Toutanova

Llion Jones

Ming-Wei Chang

Andrew Dai

Jakob Uszkoreit

Quoc Le

Slav Petrov

Transactions of the Association of Computational Linguistics (2019) (to appear)

BERT Rediscovers the Classical NLP Pipeline

Ian Tenney

Dipanjan Das

Ellie Pavlick

Association for Computational Linguistics (2019) (to appear)

Conceptual Captions: A Cleaned, Hypernymed, Image Alt-text Dataset For Automatic Image Captioning

Piyush Sharma

Nan Ding

Sebastian Goodman

Radu Soricut

ACL (2018)

Ask the Right Questions: Active Question Reformulation with Reinforcement Learning

Christian Buck

Jannis Bulian

Massimiliano Ciaramita

Wojciech Paweł Gajewski

Andrea Gesmundo

Neil Houlsby

Wei Wang

Sixth International Conference on Learning Representations (2018)

Massively Multilingual Neural Machine Translation

Melvin Johnson

Orhan Firat

Roee Aharoni

Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), Association for Computational Linguistics, Minneapolis, Minnesota, pp. 3874-3884 (to appear)

Mind the GAP: A Balanced Corpus of Gendered Ambiguous Pronouns

Kellie Webster

Marta Recasens

Vera Axelrod

Jason Baldridge

Transactions of the Association for Computational Linguistics, 6 (2018), pp. 605-618

Matching the Blanks: Distributional Similarity for Relation Learning

Livio Baldini Soares

Nicholas Arthur FitzGerald

Jeffrey Ling

Tom Kwiatkowski

ACL 2019 - The 57th Annual Meeting of the Association for Computational Linguistics (2019) (to appear)

Counterfactual Fairness in Text Classification through Robustness

Sahaj Garg

Vincent Perot

Nicole Limtiaco

Ankur Taly

Ed H. Chi

Alex Beutel

AAAI/ACM Conference on Artificial Intelligence, Ethics, and Society (AIES) (2019)

Monotonic Infinite Lookback Attention for Simultaneous Machine Translation

Naveen Ari

Colin Andrew Cherry

Wolfgang Macherey

Chung-Cheng Chiu

Semih Yavuz

Ruoming Pang

Wei Li

Colin Raffel

Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics (ACL), Association for Computational Linguistics, Florence, Italy (2019), pp. 1313-1323