Large scale distributed neural network training through online distillation

Rohan Anil; Gabriel Pereyra; Alexandre Tachard Passos; Robert Ormandi; George Dahl; Geoffrey Hinton

Large scale distributed neural network training through online distillation

Rohan Anil

Gabriel Pereyra

Alexandre Tachard Passos

Robert Ormandi

George Dahl

Geoffrey Hinton

ICLR (2018)

Download Google Scholar

Abstract

While techniques such as ensembling and distillation promise model quality improvements when paired with almost any base model they are seldom used as the multi-stage training setups they require are cumbersome and the extra hyperparameters introduced make the process of tuning even more expensive. In this paper we explore a variant of distillation which is relatively straightforward to use as it does not require a complicated multi-stage setup. We also show that distillation can be used as a meaningful distributed learning algorithm: instead of independent workers exchanging gradients, which requires worrying about delays and synchronization, independent workers can exchange full model checkpoints. This can be done far less frequently than exchanging gradients, breaking one of the scalability barriers of stochastic gradient descent. We have experiments on Criteo clickthrough rate, and the largest to-date dataset used for neural language modeling, based on Common Crawl and containing $6\times 10^{11}$ tokens. In these experiments we show we can scale at least $2\times$ as well as the maximum limit of distributed stochastic gradient descent. Finally, we also show that online distillation can dramatically reduce the churn in the predictions between different versions of a model.

Research Areas

Machine intelligence

Explore our many areas of focus

Building a collaborative ecosystem

Shaping the future together

Translating discovery into real-world impact

Large scale distributed neural network training through online distillation

Abstract

Research Areas

Meet the teams driving innovation

Google AI

Google Cloud

Google DeepMind

Google Labs