Scaling Language Model Size in Cross-Device Federated Learning

Jae Hun Ro; Theresa Breiner; Lara McConnaughey; Mingqing Chen; Ananda Theertha Suresh; Shankar Kumar; Rajiv Mathews

Scaling Language Model Size in Cross-Device Federated Learning

Jae Hun Ro

Theresa Breiner

Lara McConnaughey

Mingqing Chen

Ananda Theertha Suresh

Shankar Kumar

Rajiv Mathews

FL4NLP@ACL2022 (2022) (to appear)

Google Scholar

Abstract

Most studies in cross-device federated learning focus on small models, due to the server-client communication and on-device computation bottlenecks. In this work, we leverage various techniques for mitigating these bottlenecks to train larger language models in cross-device federated learning. With systematic applications of partial model training, quantization, efficient transfer learning, and communication-efficient optimizers, we are able to train a 21M parameter Transformer that achieves the same perplexity as that of a similarly sized LSTM with ~10x smaller client-to-server communication cost and 11% lower perplexity than smaller LSTMs commonly studied in literature.

Research Areas

Machine intelligence

Explore our many areas of focus

Building a collaborative ecosystem

Shaping the future together

Translating discovery into real-world impact

Scaling Language Model Size in Cross-Device Federated Learning

Abstract

Research Areas

Meet the teams driving innovation

Google AI

Google Cloud

Google DeepMind

Google Labs