Natural Language Processing

Natural Language Processing (NLP) research at Google focuses on algorithms that apply at scale, across languages, and across domains. Our systems are used in numerous ways across Google, impacting user experience in search, mobile, apps, ads, translate and more.

Our work spans the range of traditional NLP tasks, with general-purpose syntax and semantic algorithms underpinning more specialized systems. We are particularly interested in algorithms that scale well and can be run efficiently in a highly distributed environment.

Our syntactic systems predict part-of-speech tags for each word in a given sentence, as well as morphological features such as gender and number. They also label relationships between words, such as subject, object, modification, and others. We focus on efficient algorithms that leverage large amounts of unlabeled data, and recently have incorporated neural net technology.

On the semantic side, we identify entities in free text, label them with types (such as person, location, or organization), cluster mentions of those entities within and across documents (coreference resolution), and resolve the entities to the Knowledge Graph.

Recent work has focused on incorporating multiple sources of knowledge and information to aid with analysis of text, as well as applying frame semantics at the noun phrase, sentence, and document level.

Recent Publications

How Does Beam Search improve Span-Level Confidence Estimation in Generative Sequential Labeling?

Kazuma Hashimoto

Iftekhar Naim

Karthik Raman

EACL 2024 workshop on UncertaiNLP

Conformal Language Modeling

Victor Quach

Adam Fisch

Tal Schuster

Adam Yala

Jae Ho Sohn

Tommi Jaakkola

Regina Barzilay

ICLR(2024)

LinguaMeta: Unified Metadata for Thousands of Languages

Sandy Ritchie

Daan van Esch

Uche Okonkwo

Shikhar Vashishth

Emily Drummond

Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

Connecting Language Technologies with Rich, Diverse Data Sources Covering Thousands of Languages

Daan van Esch

Sandy Ritchie

Sebastian Ruder

Julia Kreutzer

Clara Rivera

Ishank Saxena

Isaac Caswell

Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

Now You See Me, Now You Don't: 'Poverty of the Stimulus' Problems and Arbitrary Correspondences in End-to-End Speech Models

Daan van Esch

Proceedings of the Second Workshop on Computation and Written Language (CAWL) 2024

Multilingual Instruction Tuning With Just a Pinch of Multilinguality

Uri Shaham

Jonathan Herzig

Roee Aharoni

Idan Szpektor

Reut Tsarfaty

Matan Eyal

arXiv(2024)

Defining the technology of today and tomorrow.

Philosophy

People

Teams

AI/ML Foundations  & Capabilities

Algorithms & Optimization

Computing Paradigms

Responsible Human-Centric Technology

Science & Societal Impact

Projects

Publications

Resources

Shaping the future, together.

Student programs

Faculty programs

Conferences & events

Natural Language Processing

Recent Publications

Some of our teams

Join us

Defining the technology of today and tomorrow.

Philosophy

People

Teams

AI/ML Foundations & Capabilities

Algorithms & Optimization

Computing Paradigms

Responsible Human-Centric Technology

Science & Societal Impact

Projects

Publications

Resources

Shaping the future, together.

Student programs

Faculty programs

Conferences & events

Natural Language Processing

Recent Publications

Some of our teams

Join us

AI/ML Foundations  & Capabilities