LaMDA: Language Models for Dialog Applications

James Qin
Noam Shazeer
Chung-ching Chang
Joe Fenton
Maarten Paul Bosma
Marc Joseph Pickett
Erin Hoffman-John
Kathy Meier-Hellstern
Vincent Y. Zhao
Marian Croak
Steven Zheng
Cosmo Du
Ravi Rajakumar
Taylor Bos
Tulsee Doshi
Jamie Hall
Ray Kurzweil
Will Rusch
Igor Krivokon
Marcelo Amorim Menegali
Alena Butryna
Johnny Soraker
Dehao Chen
Aaron Daniel Cohen
Ben Zevenbergen
Alicia Jin
Maxim Krikun
Toju Duke
Daniel De Freitas Adiwardana
Apoorv Kulshreshtha
Rachel Bernstein
Romal Thoppilan
Dmitry (Dima) Lepikhin
Yuanzhong Xu
arXiv (2022)
Google Scholar

Abstract

We present LaMDA: Language Models for Dialog Applications. LaMDA is a family of Transformer-based neural language models specialized for dialog, which have up to 137B parameters and arepre-trained on 1.56T words of public dialog data and web text. While model scaling alone canimprove quality, it shows less improvements on safety and factual grounding. We demonstrate thatfine-tuning with annotated data and enabling the model to consult external knowledge sources canlead to significant improvements towards the two key challenges of safety and factual grounding.The first challenge, safety, involves ensuring that the model’s responses are consistent with a set ofhuman values, such as preventing harmful suggestions and unfair bias. We quantify safety using ametric based on an illustrative set of values, and we find that filtering candidate responses using aLaMDA classifier fine-tuned with a small amount of crowdworker-annotated data offers a promisingapproach to improving model safety. The second challenge, factual grounding, involves enabling themodel to consult external knowledge sources, such as an information retrieval system, a languagetranslator, and a calculator. We quantify factuality using a groundedness metric, and we find that ourapproach enables the model to generate responses grounded in known sources, rather than responsesthat merely sound plausible. Finally, we explore the use of LaMDA in the domains of education andcontent recommendations, and analyze their helpfulness and role consistency.