Google Research

The Effect of Network Depth on the Optimization Landscape

  • Behrooz Ghorbani
  • Shankar Krishnan
  • Ying Xiao
Deep Phenomena Workshop (ICML 2019) (2019)

Abstract

It is well-known that deeper neural networks are harder to train than shallower ones. In this short paper, we use the (full) eigenvalue spectrum of the Hessian to explore how the loss landscape changes as the network gets deeper, and as residual connections are added to the architecture. Computing a series of quantitative measures on the Hessian spectrum, we show that the Hessian eigenvalue distribution in deeper networks has substantially heavier tails (equivalently, more outlier eigenvalues), which makes the network harder to optimize with first-order methods. We show that adding residual connections mitigates this effect substantially, suggesting a mechanism by which residual connections improve training.

Research Areas

Learn more about how we do research

We maintain a portfolio of research projects, providing individuals and teams the freedom to emphasize specific types of work