Bridging the gap between streaming and non-streaming automatic speechrecognition systems through distillation of an ensemble of models

Chung-Cheng Chiu

Liangliang Cao

Olivier Siohan

Ruoming Pang

Thibault Doutre

Wei Han

Interspeech'2021

Google Scholar

Abstract

Streaming end-to-end automatic speech recognition (ASR) systems are widely used in everyday applications that require transcribing speech to text in real-time. Their small size and minimal latency make them suitable for such tasks. Unlike their non-streaming counterparts, streaming models are constrained to be causal with no future context. Nevertheless, non-streaming models can be used as teacher models to improve streaming ASR systems. An arbitrarily large set of unsupervised utterances is distilled from such teacher models so that streaming models can be trained using these generated labels. However, the performance gap between teacher and student world error rates (WER) remains high. In this paper, we propose to reduce this gap by using a diversified set of non-streaming teacher models and combining them using Recognizer Output Voting Error Reduction (ROVER). Fusing RNN-T and CTC models makes stronger teachers as they improve the performance of streaming student models. In this paper, we outperform a baseline streaming RNN-T trained from non-streaming RNN-T teachers by 27\% to 42\% depending on the language.

Research Areas

Speech Processing

Defining the technology of today and tomorrow.

Philosophy

People

Teams

AI/ML Foundations  & Capabilities

Algorithms & Optimization

Computing Paradigms

Responsible Human-Centric Technology

Science & Societal Impact

Projects

Publications

Resources

Shaping the future, together.

Student programs

Faculty programs

Conferences & events

Bridging the gap between streaming and non-streaming automatic speechrecognition systems through distillation of an ensemble of models

Abstract

Research Areas

Learn more about how we conduct our research

Defining the technology of today and tomorrow.

Philosophy

People

Teams

AI/ML Foundations & Capabilities

Algorithms & Optimization

Computing Paradigms

Responsible Human-Centric Technology

Science & Societal Impact

Projects

Publications

Resources

Shaping the future, together.

Student programs

Faculty programs

Conferences & events

Bridging the gap between streaming and non-streaming automatic speechrecognition systems through distillation of an ensemble of models

Abstract

Research Areas

Learn more about how we conduct our research

AI/ML Foundations  & Capabilities