- Valentin Thomas
- Fabian Pedregosa
- Bart van Merriënboer
- Pierre-Antoine Manzagol
- Yoshua Bengio
- Nicolas Le Roux
Abstract
This work revisits the notion of \textit{information criterion} to characterize generalization for modern deep learning models. In particular, we empirically demonstrate the effectiveness of the Takeuchi Information Criterion, an extension of the Akaike Information Criterion for misspecified models, in estimating the generalization gap, shedding light on why quantities such as the number of parameters cannot quantify generalization. The TIC depends on both the Hessian of the loss $\rmH$ and the covariance matrix of the gradients $\rmSS$. By exploring the semantic and numerical similarities and differences between these two matrices as well as the Fisher information matrix $\rmF$, we bring further evidence that flatness cannot in itself predict generalization. We also address the question of when is $\rmSS$ a reasonable approximation to $\rmF$, as commonly assumed.
Research Areas
Learn more about how we do research
We maintain a portfolio of research projects, providing individuals and teams the freedom to emphasize specific types of work