Contextual prediction models for speech recognition

Yoni Halpern
Keith Hall
Vlad Schogol
Martin Baeuml
Proceedings of Interspeech 2016

Abstract

We introduce an approach to biasing language models towards
known contexts without requiring separate language models or
explicit contextually-dependent conditioning contexts. We do
so by presenting an alternative ASR objective, where we predict
the acoustics and words given the contextual cue, such as
the geographic location of the speaker. A simple factoring of the
model results in an additional biasing term, which effectively
indicates how correlated a hypothesis is with the contextual cue
(e.g., given the hypothesized transcript, how likely is the user’s
known location). We demonstrate that this factorization allows
us to train relatively small contextual models which are effective
in speech recognition. An experimental analysis shows both a
perplexity reduction and a significant word error rate reductions
on a voice search task when using the user’s location as a contextual
cue.