Google Research

Full Matrix Preconditioning Made Practical

Beyond First Order Methods in ML (2019) (to appear)

Abstract

Full matrix preconditioning methods are generally considered intractable due to the computational and memory overheads. However, there has been recent progress in approximating full matrix preconditioners based on the structure of neural networks. In this work we improve on the Shampoo algorithm extending the method so it runs well in practice on distributed neural network training systems. We describe our implementation and demonstrate superior performance on a machine translation task, where our implementation achieves 1.67x speedup in training time by effectively utilizing the existing heterogeneous distributed hardware resources.

Research Areas

Learn more about how we do research

We maintain a portfolio of research projects, providing individuals and teams the freedom to emphasize specific types of work