- Lionel Ngoupeyou Tondji
- Moustapha Cisse
- Sergii Kashubin
Abstract
Most variance reduction methods for stochastic optimization are primarily designed for smooth and strongly convex functions. They also often come with high memory requirements. Consequently, they do not scale to large scale deep learning settings where we are in presence of massive neural networks and virtually infinite data due to the use of data augmentation strategies. In this work, we extend convex online variance reduction to the realm of deep learning. We exploit the ubiquitous clustering structure of rich datasets used in deep learning to design a scalable variance reduced optimization procedure. Our proposal allows to leverage prior knowledge about a given problem to speedup the learning process. It is robust and theoretically well-motivated. Our experiments show that it is superior or on par with most widely used optimizers in deep learning on standard benchmark datasets.
Research Areas
Learn more about how we do research
We maintain a portfolio of research projects, providing individuals and teams the freedom to emphasize specific types of work