Train faster, generalize better: Stability of stochastic gradient descent

Benjamin Recht; Moritz Hardt; Yoram Singer

Train faster, generalize better: Stability of stochastic gradient descent

Benjamin Recht

Moritz Hardt

Yoram Singer

arXiv (2015)

Google Scholar

Abstract

We show that any model trained by a stochastic gradient method with few
iterations has vanishing generalization error. Our results apply to both
convex and non-convex optimization under standard Lipschitz and smoothness assumptions. Our bounds hold in cases where existing uniform convergence bounds do not apply, for instance, if there is no explicit form of
regularization and the model capacity far exceeds the sample size. Conceptually, our findings help explain the widely observed empirical success of training large models with gradient descent methods. They further underscore the importance of reducing training time beyond the obvious benefit of saving time.

Research Areas

Machine intelligence

Explore our many areas of focus

Building a collaborative ecosystem

Shaping the future together

Translating discovery into real-world impact

Train faster, generalize better: Stability of stochastic gradient descent

Abstract

Research Areas

Meet the teams driving innovation

Google AI

Google Cloud

Google DeepMind

Google Labs