Efficient Training of Language Models using Few-Shot Learning

Sashank Reddi; Sobhan Miryoosefi; Stefani Karp; Shankar Krishnan; Satyen Kale; Seungyeon Kim; Sanjiv Kumar

Efficient Training of Language Models using Few-Shot Learning

Sashank Reddi

Sobhan Miryoosefi

Stefani Karp

Shankar Krishnan

Satyen Kale

Seungyeon Kim

Sanjiv Kumar

ICML (2023)

Download Google Scholar

Abstract

Large deep learning models have achieved state-of-the-art performance across various natural language processing (NLP) tasks and demonstrated remarkable few-shot learning performance. However, training them is often challenging and resource-intensive. In this paper, we study an efficient approach to train language models using few-shot learners. We show that, by leveraging the fast learning nature of few-shot learners, one can train language models efficiently in a stagewise manner. Our main insight is that stacking a good few-shot learner on a good small language model provides a good initializer for a larger language model. Using this insight and building upon progressive stacking approaches, we develop novel approaches for training such networks in a stagewise manner. Furthermore, we also provide a theoretical framework and accompanying empirical studies to support our insights, thereby creating a theoretical foundation for progressive stacking. Finally, we provide empirical results to demonstrate the effectiveness of our approach in reducing the training time of few-shot learners.

Explore our many areas of focus

Building a collaborative ecosystem

Shaping the future together

Translating discovery into real-world impact

Efficient Training of Language Models using Few-Shot Learning

Abstract

Research Areas

Meet the teams driving innovation

Google AI

Google Cloud

Google DeepMind

Google Labs