Learned Optimizers that Scale and Generalize

Olga Wichrowska; Niru Maheswaranathan; Matthew W. Hoffman; Sergio Gomez Colmenarejo; Misha Denil; Nando de Freitas; Jascha Sohl-Dickstein

Learned Optimizers that Scale and Generalize

Olga Wichrowska

Niru Maheswaranathan

Matthew W. Hoffman

Sergio Gomez Colmenarejo

Misha Denil

Nando de Freitas

Jascha Sohl-Dickstein

ICML (2017)

Download Google Scholar

Abstract

Learning to learn has emerged as an important direction for achieving artificial intelligence. Two of the primary barriers to its adoption are an inability to scale to larger problems and a limited ability to generalize to new tasks. We introduce a learned gradient descent optimizer that generalizes well to new tasks, and which has significantly reduced memory and computation overhead. We achieve this by introducing a novel hierarchical RNN architecture, with minimal per-parameter overhead, augmented with additional architectural features that mirror the known structure of optimization tasks. We also develop a meta-training ensemble of small, diverse, optimization tasks capturing common properties of loss landscapes. The optimizer learns to out-perform RMSProp/ADAM on problems in this corpus. More importantly, it performs comparably or better when applied to small convolutional neural networks, despite seeing no neural networks in its meta-training set. Finally, it generalizes to train Inception V3 and ResNet V2 architectures on the ImageNet dataset, optimization problems that are of a vastly different scale than those it was trained on.

Research Areas

Machine intelligence

Explore our many areas of focus

Building a collaborative ecosystem

Shaping the future together

Translating discovery into real-world impact

Learned Optimizers that Scale and Generalize

Abstract

Research Areas

Meet the teams driving innovation

Google AI

Google Cloud

Google DeepMind

Google Labs