DEEP CONTEXT: END-TO-END CONTEXTUAL SPEECH RECOGNITION

Golan Pundak

Tara Sainath

Rohit Prabhavalkar

Anjuli Kannan

Ding Zhao

IEEE SLT (2018)

Download Google Scholar

Abstract

In automatic speech recognition (ASR) what a user says depends on the particular context she is in. Typically, this context is represented as a set of word n-grams. In this work, we present a novel, all-neural, end-to-end (E2E) ASR system that utilizes such context. Our approach, which we refer to as Contextual Listen, Attend and Spell (CLAS) jointlyoptimizes the ASR components along with embeddings of the context n-grams. During inference, the CLAS system can be presented with context phrases which might contain out-ofvocabulary (OOV) terms not seen during training. We compare our proposed system to a more traditional contextualization approach, which performs shallow-fusion between independently trained LAS and contextual n-gram models during beam search. Across a number of tasks, we find that the proposed CLAS system outperforms the baseline method by as much as 68% relative WER, indicating the advantage of joint optimization over individually trained components. Index Terms: speech recognition, sequence-to-sequence models, listen attend and spell, LAS, attention, embedded speech recognition.

Research Areas

Speech Processing

Defining the technology of today and tomorrow.

Philosophy

People

Teams

AI/ML Foundations  & Capabilities

Algorithms & Optimization

Computing Paradigms

Responsible Human-Centric Technology

Science & Societal Impact

Projects

Publications

Resources

Shaping the future, together.

Student programs

Faculty programs

Conferences & events

DEEP CONTEXT: END-TO-END CONTEXTUAL SPEECH RECOGNITION

Abstract

Research Areas

Learn more about how we conduct our research

Defining the technology of today and tomorrow.

Philosophy

People

Teams

AI/ML Foundations & Capabilities

Algorithms & Optimization

Computing Paradigms

Responsible Human-Centric Technology

Science & Societal Impact

Projects

Publications

Resources

Shaping the future, together.

Student programs

Faculty programs

Conferences & events

DEEP CONTEXT: END-TO-END CONTEXTUAL SPEECH RECOGNITION

Abstract

Research Areas

Learn more about how we conduct our research

AI/ML Foundations  & Capabilities