Parallel rescoring with Transformer for Streaming On-Device speech recognition

Wei Li

James Qin

Chung-Cheng Chiu

Ruoming Pang

Yanzhang (Ryan) He

(2020)

Download Google Scholar

Abstract

Two-pass models have achieved better quality for on-device speech recognition, where a 1st-pass recurrent neural network transducer (RNN-T) model generates hypotheses in a streaming fashion, and a 2nd-pass Listen, Attend and Spell (LAS) model re-scores the hypotheses with full audio sequence context. Such models provide both fast responsiveness with the 1st-pass model and better quality with the 2nd-pass model. The computation latency from the 2nd-pass model is a critical problem, as the model has to wait for the speech and hypotheses from the first pass to be complete. Yet the rescoring latency is constrained by the recurrent nature of LSTM, as the processing for each sequence has to run sequentially. In this work we explore replacing the LSTM layers in the 2nd-pass rescorer with Transformer layers, which can process the entire hypothesis sequences in parallel and can therefore utilize the on-device computation resources more efficiently. Compared with an LAS-based baseline, our proposed transformer rescorer achieves more than 50% latency reduction with quality improvement.

Defining the technology of today and tomorrow.

Philosophy

People

Teams

AI/ML Foundations  & Capabilities

Algorithms & Optimization

Computing Paradigms

Responsible Human-Centric Technology

Science & Societal Impact

Projects

Publications

Resources

Shaping the future, together.

Student programs

Faculty programs

Conferences & events

Parallel rescoring with Transformer for Streaming On-Device speech recognition

Abstract

Research Areas

Meet the teams driving innovation

Defining the technology of today and tomorrow.

Philosophy

People

Teams

AI/ML Foundations & Capabilities

Algorithms & Optimization

Computing Paradigms

Responsible Human-Centric Technology

Science & Societal Impact

Projects

Publications

Resources

Shaping the future, together.

Student programs

Faculty programs

Conferences & events

Parallel rescoring with Transformer for Streaming On-Device speech recognition

Abstract

Research Areas

Meet the teams driving innovation

AI/ML Foundations  & Capabilities