Benign Underfitting of SGD in Stochastic Convex Optimization

Roi Livni

Tomer Koren

uri sherman

Yishay Mansour

NeurIPS 2022

Download Google Scholar

Abstract

A natural, seemingly undisputed understanding is that a learning algorithm must obtain a good fit to the training data in order to perform well on independent test data.
This view is reflected both in the traditional bias-complexity trade-off, and in the newly emerging theory of generalization of interpolating learning rules. In this work, we ask to what extent may stochastic gradient descent (SGD) be similarly understood to generalize by means of fit to the training data.
We consider the fundamental stochastic convex optimization framework, and seek bounds on the empirical risk and generalization gap of one-pass SGD.
Surprisingly, we discover there exist convex learning problems where the output of SGD exhibits empirical risk and generalization gap both $\Omega(1)$, but generalizes well nonetheless with population risk of $O(1/\sqrt n)$, as guaranteed by the classical analysis. Consequently, it turns out SGD is not algorithmically stable in \emph{any} sense, and cannot be explained by uniform convergence or any other currently known generalization bound technique for that matter (other than that of its classical analysis).
We then continue to analyze the variant of SGD given by sampling \emph{with}-replacement from the training set, for which we prove that in counter to what might be expected, overfitting does not occur and the population risk converges at the optimal rate.
Finally, we study empirical risk guaranties of multiple epochs of without-replacement SGD, and derive nearly tight upper and lower bounds significantly improving over previously known results in this setting.

Research Areas

Machine Intelligence

Defining the technology of today and tomorrow.

Philosophy

People

Research areas

Foundational ML & Algorithms

Computing Systems & Quantum AI

Science, AI & Society

Projects

Publications

Resources

Shaping the future, together.

Student programs

Faculty programs

Conferences & events

Benign Underfitting of SGD in Stochastic Convex Optimization

Abstract

Research Areas

Learn more about how we conduct our research