Confident Adaptive Language Modeling

Tal Schuster; Adam Fisch; Jai Prakash Gupta; Mostafa Dehghani; Dara Bahri; Vinh Quoc Tran; Yi Tay; Don Metzler

Confident Adaptive Language Modeling

Tal Schuster

Adam Fisch

Jai Prakash Gupta

Mostafa Dehghani

Dara Bahri

Vinh Quoc Tran

Yi Tay

Don Metzler

NeurIPS 2022

Download Google Scholar

Listen with Illuminate

Abstract

Recent advances in Transformer-based large language models (LLMs) achieved significant performance improvements across many tasks.
These gains come with a drastic increase in the models' size, leading to slow and costly use at inference time.
In practice, however, the series of generations made by LLMs is composed of varying levels of difficulty. While certain predictions truly benefit from the models' full capacity, other continuations are more trivial and can be solved with reduced compute.
In this work, we introduce Confident Adaptive Language Modeling (CALM), a method for dynamically allocating different amounts of compute per example and per generation timestep.
Early exit decoding involves several challenges that we address here, such as: (1) what confidence measure to use; (2) connecting sequence-level constraints to local per-token exit decisions; and (3) attending back to missing hidden representations due to early exits in previous tokens.
Through theoretical analysis and empirical experiments on three diverse generation tasks, we demonstrate the efficacy of our method in reliably reducing compute while maintaining high performance.

Research Areas

Machine intelligence

Explore our many areas of focus

Building a collaborative ecosystem

Shaping the future together

Translating discovery into real-world impact

Confident Adaptive Language Modeling

Abstract

Research Areas

Meet the teams driving innovation

Google AI

Google Cloud

Google DeepMind

Google Labs