Sequence Discriminative Distributed Training of Long Short-Term Memory Recurrent Neural Networks

Hasim Sak; Oriol Vinyals; Georg Heigold; Andrew Senior; Erik McDermott; Rajat Monga; Mark Mao

Sequence Discriminative Distributed Training of Long Short-Term Memory Recurrent Neural Networks

Hasim Sak

Oriol Vinyals

Georg Heigold

Andrew Senior

Erik McDermott

Rajat Monga

Mark Mao

Interspeech (2014)

Google Scholar

Abstract

We recently showed that Long Short-Term Memory (LSTM) recurrent neural networks (RNNs) outperform state-of-the-art deep neural networks (DNNs) for large scale acoustic modeling where the models were trained with the cross-entropy (CE) criterion. It has also been shown that sequence discriminative training of DNNs initially trained with the CE criterion gives significant improvements.
In this paper, we investigate sequence discriminative training of LSTM RNNs in a large scale acoustic modeling task. We train the models in a distributed manner using asynchronous stochastic gradient descent optimization technique. We compare two sequence discriminative criteria -- maximum mutual information and state-level minimum Bayes risk, and we investigate a number of variations of the basic training strategy to better understand issues raised by both the sequential model, and the objective function. We obtain significant gains over the CE trained LSTM RNN model using
sequence discriminative training techniques.

Research Areas

Machine intelligence

Explore our many areas of focus

Building a collaborative ecosystem

Shaping the future together

Translating discovery into real-world impact

Sequence Discriminative Distributed Training of Long Short-Term Memory Recurrent Neural Networks

Abstract

Research Areas

Meet the teams driving innovation

Google AI

Google Cloud

Google DeepMind

Google Labs