An Efficient Streaming Non-Recurrent On-Device End-to-End Model with Improvements to Rare-Word Modeling

Tara N Sainath

Yanzhang (Ryan) He

Arun Narayanan

Rami Botros

Ruoming Pang

David Johannes Rybach

Cyril Allauzen

Ehsan Variani

James Qin

Quoc-Nam Le-The

Alex Gruenstein

Anmol Gulati

Bo Li

Cal Peyser

Chung-Cheng Chiu

Diamantino A. Caseiro

Emmanuel Guzman

Ian Carmichael McGraw

Jiahui Yu

Michael D. Riley

Pat Rondon

Qiao Liang

Sepand Mavandadi

Shuo-yiin Chang

Trevor Deatrick Strohman

W. Ronny Huang

Wei Li

Yonghui Wu

Yu Zhang

Interspeech (2021) (to appear)

Download Google Scholar

Abstract

On-device end-to-end (E2E) models have shown improvementsover a conventional model on Search test sets in both quality, as measured by Word Error Rate (WER), and latency, measured by the time the result is finalized after the user stops speaking. However, the E2E model is trained on a small fraction of audio-text pairs compared to the 100 billion text utterances that a conventional language model (LM) is trained with. Thus E2E models perform poorly on rare words and phrases. In this paper, building upon the two-pass streaming Cascaded Encoder E2E model, we explore using a Hybrid Autoregressive Transducer (HAT) factorization to better integrate an on-device neural LM trained on text-only data. Furthermore, to further improve decoder latency we introduce a non-recurrent embedding decoder, in place of the typical LSTM decoder, into the Cascaded Encoder model. Overall, we present a streaming on-device model that incorporates an external neural LM and outperforms the conventional model in both search and rare-word quality, as well as latency, and is 318X smaller.

Research Areas

Speech Processing

Defining the technology of today and tomorrow.

Philosophy

People

Teams

AI/ML Foundations  & Capabilities

Algorithms & Optimization

Computing Paradigms

Responsible Human-Centric Technology

Science & Societal Impact

Projects

Publications

Resources

Shaping the future, together.

Student programs

Faculty programs

Conferences & events

An Efficient Streaming Non-Recurrent On-Device End-to-End Model with Improvements to Rare-Word Modeling

Abstract

Research Areas

Learn more about how we conduct our research

Defining the technology of today and tomorrow.

Philosophy

People

Teams

AI/ML Foundations & Capabilities

Algorithms & Optimization

Computing Paradigms

Responsible Human-Centric Technology

Science & Societal Impact

Projects

Publications

Resources

Shaping the future, together.

Student programs

Faculty programs

Conferences & events

An Efficient Streaming Non-Recurrent On-Device End-to-End Model with Improvements to Rare-Word Modeling

Abstract

Research Areas

Learn more about how we conduct our research

AI/ML Foundations  & Capabilities