Efficient Full-Matrix Adaptive Regularization

Naman Agarwal; Brian Anderson Bullins; Xinyi Chen; Elad Hazan; Karan Singh; Cyril Zhang; Yi Zhang

Efficient Full-Matrix Adaptive Regularization

Naman Agarwal

Brian Anderson Bullins

Xinyi Chen

Elad Hazan

Karan Singh

Cyril Zhang

Yi Zhang

ICML (2019)

Download Google Scholar

Abstract

Adaptive regularization methods pre-multiply a descent direction by a preconditioning matrix. Due to the large number of parameters of machine learning problems, full-matrix preconditioning methods are prohibitively expensive. We show how to modify full-matrix adaptive regularization in order to make it practical and effective. We also provide novel theoretical analysis for adaptive regularization in non-convex optimization settings. The core of our algorithm, termed GGT, consists of efficient inverse computation of square roots of low-rank matrices. Our preliminary experiments underscore improved convergence rate of GGT across a variety of synthetic tasks and standard deep learning benchmarks.

Research Areas

Machine intelligence

Explore our many areas of focus

Building a collaborative ecosystem

Shaping the future together

Translating discovery into real-world impact

Efficient Full-Matrix Adaptive Regularization

Abstract

Research Areas

Meet the teams driving innovation

Google AI

Google Cloud

Google DeepMind

Google Labs