Using Machine Learning to Discover Neural Network Optimizers
March 28, 2018
Posted by Irwan Bello, Research Associate, Google Brain Team
Quick links
Deep learning models have been deployed in numerous Google products, such as Search, Translate and Photos. The choice of optimization method plays a major role when training deep learning models. For example, stochastic gradient descent works well in many situations, but more advanced optimizers can be faster, especially for training very deep networks. Coming up with new optimizers for neural networks, however, is challenging due to to the non-convex nature of the optimization problem. On the Google Brain team, we wanted to see if it could be possible to automate the discovery of new optimizers, in a way that is similar to how AutoML has been used to discover new competitive neural network architectures.
In “Neural Optimizer Search with Reinforcement Learning”, we present a method to discover optimization methods with a focus on deep learning architectures. Using this method we found two new optimizers, PowerSign and AddSign, that are competitive on a variety of different tasks and architectures, including ImageNet classification and Google’s neural machine translation system. To help others benefit from this work we have made the optimizers available in Tensorflow.
Neural Optimizer Search makes use of a recurrent neural network controller which is given access to a list of simple primitives that are typically relevant for optimization. These primitives include, for example, the gradient or the running average of the gradient and lead to search spaces with over 1010 possible combinations. The controller then generates the computation graph for a candidate optimizer or update rule in that search space.
In our paper, proposed candidate update rules (U) are used to train a child convolutional neural network on CIFAR10 for a few epochs and the final validation accuracy (R) is fed as a reward to the controller. The controller is trained with reinforcement learning to maximize the validation accuracies of the sampled update rules. This process is illustrated below.
An overview of Neural Optimizer Search using an iterative process to discover new optimizers. |
Graph comparing learning rate decay functions for linear cosine decay, stepwise decay and cosine decay. |
We are excited that Neural Optimizer Search can not only improve the performance of machine learning models but also potentially lead to new, interpretable equations and discoveries. It is our hope that open sourcing these optimizers in Tensorflow will be useful to machine learning practitioners.