Discover: Deep Scalable Variance Reduction

Lionel Ngoupeyou Tondji
Moustapha Cisse
CoRR, abs/2111.11828 (2021)

Abstract

Most variance reduction methods for stochastic optimization are primarily designed for smooth and strongly convex functions. They also often come with high memory requirements. Consequently, they do not scale to large scale deep learning settings where we are in presence of massive neural networks and virtually infinite data due to the use of data augmentation strategies. In this work, we extend convex online variance reduction to the realm of deep learning. We exploit the ubiquitous clustering structure of rich datasets used in deep learning to design a scalable variance reduced optimization procedure. Our proposal allows to leverage prior knowledge about a given problem to speedup the learning process. It is robust and theoretically well-motivated. Our experiments show that it is superior or on par with most widely used optimizers in deep learning on standard benchmark datasets.

Research Areas