We advance the state of the art in natural language technologies and build systems that learn to understand and generate language in context.
About the team
Our team comprises multiple research groups working on a wide range of natural language understanding and generation projects. We pursue long-term research to develop novel capabilities that can address the needs of current and future Google products. We publish frequently and evaluate our methods on established scientific benchmarks (e.g., SQuAD, GLUE, SuperGlue) or develop new ones for measuring progress (e.g., Conceptual Captions, Natural Questions, TyDiQA). We collaborate with other teams across Google to deploy our research to the benefit of our users. Our product contributions often stretch the boundaries of what is technically possible. Applications of our research have resulted in better language capabilities across all major Google products.
Our researchers are experts in natural language processing and machine learning with varied backgrounds and a passion for language. Computer scientists and linguists work hand-in-hand to provide insight into ways to define language tasks, collect valuable data, and assist in enabling internationalization. Researchers and engineers work together to develop new neural network models that are sensitive to the nuances of language while taking advantage of the latest advances in specialized compute hardware (e.g., TPUs) to produce scalable solutions that can be used by billions of users.
Research areas
Team focus summaries
Highlighted projects
The COVID-19 Research Explorer is a semantic search interface on top of the COVID-19 Open Research Dataset (CORD-19), which includes more than 50,000 journal articles and preprints.
Neural networks enable people to use natural language to get questions answered from information stored in tables.
We implemented an improved approach to reducing gender bias in Google Translate that uses a dramatically different paradigm to address gender bias by rewriting or post-editing the initial translation.
To encourage more research on multilingual learning, we introduce “XTREME: A Massively Multilingual Multi-task Benchmark for Evaluating Cross-lingual Generalization”, which covers 40 typologically diverse languages (spanning 12 language families) and includes nine tasks that collectively require reasoning about different levels of syntax or semantics.
We add the Street View panoramas referenced in the Touchdown dataset to the existing StreetLearn dataset to support the broader community's ability to use Touchdown for researching vision and language navigation and spatial description resolution in Street view settings.
To encourage research on multilingual question-answering, we released TyDi QA, a question answering corpus covering 11 Typologically Diverse languages
We present a novel, open sourced method for text generation that is less error-prone and can be handled by easier to train and faster to execute model architectures.
ALBERT is an upgrade to BERT that advances the state-of-the-art performance on 12 NLP tasks, including the competitive Stanford Question Answering Dataset (SQuAD v2.0) and the SAT-style reading comprehension RACE benchmark.
"We are released two new datasets for use in the research community: Paraphrase Adversaries from Word Scrambling (PAWS) in English, and PAWS-X", an extension of the PAWS dataset to six "typologically distinct languages: French, Spanish, German, Chinese, Japanese, and Korean".
In "Robust Neural Machine Translation with Doubly Adversarial Inputs" (ACL 2019), we propose an approach that uses generated adversarial examples to improve the stability of machine translation models against small perturbations in the input.
We released three new Universal Sentence Encoder multilingual modules with additional features and potential applications.
To help spur research advances in question answering, we released Natural Questions, a new, large-scale corpus for training and evaluating open-domain question answering systems, and the first to replicate the end-to-end process in which people find answers to questions.
Featured publications
Transactions of the Association of Computational Linguistics (2019) (to appear)
Association for Computational Linguistics (2019) (to appear)
Sixth International Conference on Learning Representations (2018)
Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), Association for Computational Linguistics, Minneapolis, Minnesota, pp. 3874-3884 (to appear)
Transactions of the Association for Computational Linguistics, vol. 6 (2018), pp. 605-618
ACL 2019 - The 57th Annual Meeting of the Association for Computational Linguistics (2019) (to appear)
AAAI/ACM Conference on Artificial Intelligence, Ethics, and Society (AIES) (2019)
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics (ACL), Association for Computational Linguistics, Florence, Italy (2019), pp. 1313-1323