Large-Scale Multilingual Speech Recognition with a Streaming End-to-End Model

Anjuli Kannan

Arindrima Datta

Tara Sainath

Eugene Weinstein

Bhuvana Ramabhadran

Yonghui Wu

Ankur Bapna

Zhifeng Chen

Interspeech 2019 (2019) (to appear)

Google Scholar

Abstract

Multilingual end-to-end (E2E) models have shown great promise as a means to expand coverage of the world’s lan- guages by automatic speech recognition systems. They im- prove over monolingual E2E systems, especially on low re- source languages, and simplify training and serving by elimi- nating language-specific acoustic, pronunciation, and language models. This work aims to develop an E2E multilingual system which is equipped to operate in low-latency interactive applica- tions as well as handle the challenges of real world imbalanced data. First, we present a streaming E2E multilingual model. Second, we compare techniques to deal with imbalance across languages. We find that a combination of conditioning on a language vector and training language-specific adapter layers produces the best model. The resulting E2E multilingual model system achieves lower word error rate (WER) than state-of-the- art conventional monolingual models by at least 10% relative on every language.

Defining the technology of today and tomorrow.

Philosophy

People

Teams

AI/ML Foundations  & Capabilities

Algorithms & Optimization

Computing Paradigms

Responsible Human-Centric Technology

Science & Societal Impact

Projects

Publications

Resources

Shaping the future, together.

Student programs

Faculty programs

Conferences & events

Large-Scale Multilingual Speech Recognition with a Streaming End-to-End Model

Abstract

Research Areas

Learn more about how we conduct our research

Defining the technology of today and tomorrow.

Philosophy

People

Teams

AI/ML Foundations & Capabilities

Algorithms & Optimization

Computing Paradigms

Responsible Human-Centric Technology

Science & Societal Impact

Projects

Publications

Resources

Shaping the future, together.

Student programs

Faculty programs

Conferences & events

Large-Scale Multilingual Speech Recognition with a Streaming End-to-End Model

Abstract

Research Areas

Learn more about how we conduct our research

AI/ML Foundations  & Capabilities