Long short-term memory recurrent neural network architectures for large scale acoustic modeling

Hasim Sak; Andrew W. Senior; Françoise Beaufays

Long short-term memory recurrent neural network architectures for large scale acoustic modeling

Hasim Sak

Andrew W. Senior

Françoise Beaufays

INTERSPEECH (2014), pp. 338-342

Google Scholar

Abstract

Long Short-Term Memory (LSTM) is a specific recurrent neural network (RNN) architecture that was designed to model temporal sequences and their long-range dependencies more accurately than conventional RNNs. In this paper, we explore LSTM
RNN architectures for large scale acoustic modeling in speech
recognition. We recently showed that LSTM RNNs are more
effective than DNNs and conventional RNNs for acoustic modeling, considering moderately-sized models trained on a single
machine. Here, we introduce the first distributed training of
LSTM RNNs using asynchronous stochastic gradient descent
optimization on a large cluster of machines. We show that a
two-layer deep LSTM RNN where each LSTM layer has a linear recurrent projection layer can exceed state-of-the-art speech
recognition performance. This architecture makes more effective use of model parameters than the others considered, converges quickly, and outperforms a deep feed forward neural network having an order of magnitude more parameters.

Research Areas

Machine intelligence

Explore our many areas of focus

Building a collaborative ecosystem

Shaping the future together

Translating discovery into real-world impact

Long short-term memory recurrent neural network architectures for large scale acoustic modeling

Abstract

Research Areas

Meet the teams driving innovation

Google AI

Google Cloud

Google DeepMind

Google Labs