Large Scale Distributed Acoustic Modeling With Back-off N-grams

Ciprian Chelba; Peng Xu; Fernando Pereira; Thomas Richardson

Large Scale Distributed Acoustic Modeling With Back-off N-grams

Ciprian Chelba

Peng Xu

Fernando Pereira

Thomas Richardson

ICSI, Berkeley, California (2013)

Google Scholar

Abstract

Google Voice Search is an application that provides a data-rich setup for both language and acoustic modeling research.

The approach we take revives an older approach to acoustic modeling that borrows from n-gram language modeling in an attempt to scale up both the amount of training data, and the model size (as measured by the number of parameters in the model), to approximately 100 times larger than current sizes used in automatic speech recognition.

Speech recognition experiments are carried out in an N-best list rescoring framework for Google Voice Search. We use 87,000 hours of training data (speech along with transcription) obtained by filtering utterances in Voice Search logs on automatic speech recognition confidence.

Models ranging in size between 20--40 million Gaussians are estimated using maximum likelihood training. They achieve relative reductions in word-error-rate of 11% and 6% when combined with first-pass models trained using maximum likelihood, and boosted maximum mutual information, respectively. Increasing the context size beyond five phones (quinphones) does not help.

Research Areas

Machine intelligence

Explore our many areas of focus

Building a collaborative ecosystem

Shaping the future together

Translating discovery into real-world impact

Large Scale Distributed Acoustic Modeling With Back-off N-grams

Abstract

Research Areas

Meet the teams driving innovation

Google AI

Google Cloud

Google DeepMind

Google Labs