A big data approach to acoustic model training corpus selection

Olga Kapralova; John Alex; Eugene Weinstein; Pedro Moreno; Olivier Siohan

A big data approach to acoustic model training corpus selection

Olga Kapralova

John Alex

Eugene Weinstein

Pedro Moreno

Olivier Siohan

Conference of the International Speech Communication Association (Interspeech) (2014)

Google Scholar

Abstract

Deep neural networks (DNNs) have recently become the state
of the art technology in speech recognition systems. In this paper we propose a new approach to constructing large high quality unsupervised sets to train DNN models for large vocabulary speech recognition. The core of our technique consists of two steps. We first redecode speech logged by our production recognizer with a very accurate (and hence too slow for real-time usage) set of speech models to improve the quality of ground truth transcripts used for training alignments. Using confidence scores, transcript length and transcript flattening heuristics designed to cull salient utterances from three decades of speech per language, we then carefully select training data sets consisting of up to 15K hours of speech to be used to train acoustic models without any reliance on manual transcription. We show that this approach yields models with approximately 18K context dependent states that achieve 10% relative improvement in large vocabulary dictation and voice-search systems for Brazilian Portuguese, French, Italian and Russian languages.

Explore our many areas of focus

Building a collaborative ecosystem

Shaping the future together

Translating discovery into real-world impact

A big data approach to acoustic model training corpus selection

Abstract

Meet the teams driving innovation

Google AI

Google Cloud

Google DeepMind

Google Labs