Finite versus Infinite Neural Networks:an Empirical Study

Jaehoon Lee
Sam S. Schoenholz
Jeffrey Pennington
Ben Adlam
Roman Novak
Jascha Sohl-dickstein
NeurIPS 2020

Abstract

We perform a careful, thorough, and large scale empirical study of the correspondence between wide neural networks and kernel methods. By doing so, we resolve a variety of open questions related to the study of infinitely wide neural networks. Our experimental results include: kernel methods outperform fully connected finite width networks, but underperform convolutional finite width networks; neural network Gaussian process (NNGP) kernels frequently outperform neural tangent (NT) kernels; ensembles of finite networks have reduced posterior variance and behave similarly to infinite networks; weight decay and the use of a large learning rate break the correspondence of finite and infinite networks; the NTK parameterization outperforms the standard parameterization for finite width networks; finite network performance depends non-monotonically on width in ways not captured by double descent phenomena. Our experiments additionally motivate an improved layer-wise scaling for weight decay which improves generalization in finite-width networks. Finally, we develop improved best practices for using NNGP and NT kernels for prediction. Using these best practices we achieve state-of-the-art results for non-trainable kernels on CIFAR-10 classification tasks.

Research Areas