Super-Acceleration with Cyclical Step-sizes
Abstract
Cyclical step-sizes have become increasingly popular in deep learning. Motivated by recent observations on the spectral gaps of Hessians in machine learning, we show that these step-size schedules offer a simple way to exploit such properties. More precisely, we develop a convergence rate analysis for quadratic objectives that provides optimal parameters and shows that cyclical learning rates can improve upon traditional lower complexity bounds. We further propose a systematic approach to design optimal first order methods for quadratic minimization with given spectral structure. Finally, we provide a local convergence rate analysis beyond quadratic minimization for those methods, and illustrate these findings through benchmarks on least squares and logistic regression problems.