Google Research

Discover: Deep Scalable Variance Reduction

CoRR, vol. abs/2111.11828 (2021)


Most variance reduction methods for stochastic optimization are primarily designed for smooth and strongly convex functions. They also often come with high memory requirements. Consequently, they do not scale to large scale deep learning settings where we are in presence of massive neural networks and virtually infinite data due to the use of data augmentation strategies. In this work, we extend convex online variance reduction to the realm of deep learning. We exploit the ubiquitous clustering structure of rich datasets used in deep learning to design a scalable variance reduced optimization procedure. Our proposal allows to leverage prior knowledge about a given problem to speedup the learning process. It is robust and theoretically well-motivated. Our experiments show that it is superior or on par with most widely used optimizers in deep learning on standard benchmark datasets.

Research Areas

Learn more about how we do research

We maintain a portfolio of research projects, providing individuals and teams the freedom to emphasize specific types of work