Google Research

On the interplay between noise and curvature and its effect on optimization and generalization

Proceedings of the 23rdInternational Conference on Artificial Intelligence and Statistics (AISTATS) (2020)


This work revisits the notion of \textit{information criterion} to characterize generalization for modern deep learning models. In particular, we empirically demonstrate the effectiveness of the Takeuchi Information Criterion, an extension of the Akaike Information Criterion for misspecified models, in estimating the generalization gap, shedding light on why quantities such as the number of parameters cannot quantify generalization. The TIC depends on both the Hessian of the loss $\rmH$ and the covariance matrix of the gradients $\rmSS$. By exploring the semantic and numerical similarities and differences between these two matrices as well as the Fisher information matrix $\rmF$, we bring further evidence that flatness cannot in itself predict generalization. We also address the question of when is $\rmSS$ a reasonable approximation to $\rmF$, as commonly assumed.

Research Areas

Learn more about how we do research

We maintain a portfolio of research projects, providing individuals and teams the freedom to emphasize specific types of work