A  Bayesian Perspective on Generalization and Stochastic Gradient Descent

Sam Smith; Quoc V. Le

A Bayesian Perspective on Generalization and Stochastic Gradient Descent

Sam Smith

Quoc V. Le

ICLR (2018)

Download Google Scholar

Abstract

Zhang et al. (2016) argued that understanding deep learning requires rethinking
generalization. To justify this claim, they showed that deep networks can easily
memorize randomly labeled training data, despite generalizing well when shown
real labels of the same inputs. We show here that the same phenomenon occurs
in small linear models with fewer than a thousand parameters; however there is
no need to rethink anything, since our observations are explained by evaluating
the Bayesian evidence in favor of each model. This Bayesian evidence penalizes
sharp minima. We also explore the “generalization gap” observed between small
and large batch training, identifying an optimum batch size which scales linearly
with both the learning rate and the size of the training set. Surprisingly, in our
experiments the generalization gap was closed by regularizing the model.

Explore our many areas of focus

Building a collaborative ecosystem

Shaping the future together

Translating discovery into real-world impact

A Bayesian Perspective on Generalization and Stochastic Gradient Descent

Abstract

Research Areas

Meet the teams driving innovation

Google AI

Google Cloud

Google DeepMind

Google Labs