Large-Scale Multilingual Speech Recognition with a Streaming End-to-End Model

Anjuli Kannan; Arindrima Datta; Tara Sainath; Eugene Weinstein; Bhuvana Ramabhadran; Yonghui Wu; Ankur Bapna; Zhifeng Chen

Large-Scale Multilingual Speech Recognition with a Streaming End-to-End Model

Anjuli Kannan

Arindrima Datta

Tara Sainath

Eugene Weinstein

Bhuvana Ramabhadran

Yonghui Wu

Ankur Bapna

Zhifeng Chen

Interspeech 2019 (2019) (to appear)

Google Scholar

Abstract

Multilingual end-to-end (E2E) models have shown great
promise as a means to expand coverage of the world’s lan-
guages by automatic speech recognition systems. They im-
prove over monolingual E2E systems, especially on low re-
source languages, and simplify training and serving by elimi-
nating language-specific acoustic, pronunciation, and language
models. This work aims to develop an E2E multilingual system
which is equipped to operate in low-latency interactive applica-
tions as well as handle the challenges of real world imbalanced
data. First, we present a streaming E2E multilingual model.
Second, we compare techniques to deal with imbalance across
languages. We find that a combination of conditioning on a
language vector and training language-specific adapter layers
produces the best model. The resulting E2E multilingual model
system achieves lower word error rate (WER) than state-of-the-
art conventional monolingual models by at least 10% relative
on every language.

Research Areas

Machine intelligence

Explore our many areas of focus

Building a collaborative ecosystem

Shaping the future together

Translating discovery into real-world impact

Large-Scale Multilingual Speech Recognition with a Streaming End-to-End Model

Abstract

Research Areas

Meet the teams driving innovation

Google AI

Google Cloud

Google DeepMind

Google Labs