Scale Efficiently: Insights from Pre-training and Fine-tuning Transformers

Ashish Teku Vaswani

Dani Yogatama

Don Metzler

Hyung Won Chung

Jinfeng Rao

Liam B. Fedus

Mostafa Dehghani

Samira Abnar

Sharan Narang

Yi Tay

ICLR (2022)

Download Google Scholar

Abstract

Kaplan et al. argues that the performance of a Transformer model strongly depends on the model size, but only weakly on the model shape. Our work empirically confirms their results for upstream training, but then reveals a striking discrepancy when fine-tuning: downstream task performance is strongly influenced by model shape (e.g. depth and width). We find that widely adopted models including T5-base, T5-large and T5-XL/XXL (Raffel et al. 2019) are inefficient on a compute-performance Pareto curve. To this end, we present improved scaling protocols whereby our redesigned models achieve similar downstream fine-tuning quality while having 50% fewer parameters and training 40% faster. We conclude by demonstrating that our improved scaling protocol also holds in other domains.

Research Areas

Natural Language Processing

Defining the technology of today and tomorrow.

Philosophy

People

Teams

AI/ML Foundations  & Capabilities

Algorithms & Optimization

Computing Paradigms

Responsible Human-Centric Technology

Science & Societal Impact

Projects

Publications

Resources

Shaping the future, together.

Student programs

Faculty programs

Conferences & events

Scale Efficiently: Insights from Pre-training and Fine-tuning Transformers

Abstract

Research Areas

Meet the teams driving innovation

Defining the technology of today and tomorrow.

Philosophy

People

Teams

AI/ML Foundations & Capabilities

Algorithms & Optimization

Computing Paradigms

Responsible Human-Centric Technology

Science & Societal Impact

Projects

Publications

Resources

Shaping the future, together.

Student programs

Faculty programs

Conferences & events

Scale Efficiently: Insights from Pre-training and Fine-tuning Transformers

Abstract

Research Areas

Meet the teams driving innovation

AI/ML Foundations  & Capabilities