Google Research

Train faster, generalize better: Stability of stochastic gradient descent

arXiv (2015)

Abstract

We show that any model trained by a stochastic gradient method with few iterations has vanishing generalization error. Our results apply to both convex and non-convex optimization under standard Lipschitz and smoothness assumptions. Our bounds hold in cases where existing uniform convergence bounds do not apply, for instance, if there is no explicit form of regularization and the model capacity far exceeds the sample size. Conceptually, our findings help explain the widely observed empirical success of training large models with gradient descent methods. They further underscore the importance of reducing training time beyond the obvious benefit of saving time.

Research Areas

Learn more about how we do research

We maintain a portfolio of research projects, providing individuals and teams the freedom to emphasize specific types of work