Parallel rescoring with Transformer for Streaming On-Device speech recognition

Wei Li; James Qin; Chung-Cheng Chiu; Ruoming Pang; Yanzhang (Ryan) He

Parallel rescoring with Transformer for Streaming On-Device speech recognition

Wei Li

James Qin

Chung-Cheng Chiu

Ruoming Pang

Yanzhang (Ryan) He

(2020)

Download Google Scholar

Abstract

Two-pass models have achieved better quality for on-device speech recognition, where a 1st-pass recurrent neural
network transducer (RNN-T) model generates hypotheses in a streaming fashion, and a 2nd-pass Listen, Attend and Spell (LAS) model re-scores the hypotheses with full audio sequence context. Such models provide both fast responsiveness with the 1st-pass model and better quality with the 2nd-pass model. The computation latency from the 2nd-pass model is a critical problem, as the model has to wait for the speech and hypotheses from the first pass to be complete. Yet the rescoring latency is constrained by the recurrent nature of LSTM, as the processing for each sequence has to run sequentially. In this work we explore replacing the LSTM layers in the 2nd-pass rescorer with Transformer layers, which can process the entire hypothesis sequences in parallel and can therefore utilize the on-device computation resources more efficiently.
Compared with an LAS-based baseline, our proposed transformer rescorer achieves more than 50% latency reduction with quality improvement.

Research Areas

Machine intelligence

Explore our many areas of focus

Building a collaborative ecosystem

Shaping the future together

Translating discovery into real-world impact

Parallel rescoring with Transformer for Streaming On-Device speech recognition

Abstract

Research Areas

Meet the teams driving innovation

Google AI

Google Cloud

Google DeepMind

Google Labs