- Cyril Zhang
- Kunal Talwar
- Naman Agarwal
- Rohan Anil
- Tomer Koren
Abstract
State-of-the-art optimization has increasingly moved toward massively parallel pipelines with extremely large batches. As a consequence, the performance bottleneck is shifting towards the CPU- and disk-bound data loading and preprocessing, as opposed to hardware-accelerated backpropagation. In this regime, a recently proposed approach is data echoing (Choi et al. '19), which takes repeated gradient steps on the same batch. We provide the first convergence analysis of data echoing-based extensions of ubiquitous optimization methods, exhibiting provable improvements over their synchronous counterparts. Specifically, we show that asynchronous batch reuse can magnify the gradient signal in a stochastic batch, without harming the statistical rate.
Research Areas
Learn more about how we do research
We maintain a portfolio of research projects, providing individuals and teams the freedom to emphasize specific types of work