The Effect of Network Depth on the Optimization Landscape

Behrooz Ghorbani
Shankar Krishnan
Ying Xiao
Deep Phenomena Workshop (ICML 2019) (2019)
Google Scholar

Abstract

It is well-known that deeper neural networks are harder to train than shallower ones. In this short paper, we use the (full) eigenvalue spectrum of the Hessian to explore how the loss landscape changes as the network gets deeper, and as residual connections are added to the architecture. Computing a series of quantitative measures on the Hessian spectrum, we show that the Hessian eigenvalue distribution in deeper networks has substantially heavier tails (equivalently, more outlier eigenvalues), which makes the network harder to optimize with first-order methods. We show that adding residual connections mitigates this effect substantially, suggesting a mechanism by which residual connections improve training.

Research Areas