Google Research

An Empirical study of learning rates in deep neural networks for speech recognition

Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), IEEE, Vancouver, CA (2013) (to appear)


Recent deep neural network systems for large vocabulary speech recognition are trained with minibatch stochastic gradient descent but use a variety of learning rate scheduling schemes. We investigate several of these schemes, particularly AdaGrad. Based on our analysis of its limitations, we propose a new variant ‘AdaDec’ that decouples long-term learning-rate scheduling from per-parameter learning rate variation. AdaDec was found to result in higher frame accuracies than other methods. Overall, careful choice of learning rate schemes leads to faster convergence and lower word error rates

Learn more about how we do research

We maintain a portfolio of research projects, providing individuals and teams the freedom to emphasize specific types of work