An Analysis of Incorporating an External Language Model into a Sequence-to-Sequence Model

Anjuli Kannan; Yonnghui Wu; Patrick Nguyen; Tara N. Sainath; Zhifeng Chen; Rohit Prabhavalkar

An Analysis of Incorporating an External Language Model into a Sequence-to-Sequence Model

Anjuli Kannan

Yonnghui Wu

Patrick Nguyen

Tara N. Sainath

Zhifeng Chen

Rohit Prabhavalkar

ICASSP (2018)

Download Google Scholar

Abstract

Attention-based sequence-to-sequence models for automatic
speech recognition jointly train an acoustic model, language model,
and alignment mechanism. Thus, the language model component is
only trained on transcribed audio-text pairs. This leads to the use of
shallow fusion with an external language model at inference time.
Shallow fusion refers to log-linear interpolation with a separately
trained language model at each step of the beam search. In this
work, we investigate the behavior of shallow fusion across a range of
conditions: different types of language models, different decoding
units, and different tasks. On Google Voice Search, we demonstrate
that the use of shallow fusion with an neural LM with wordpieces
yields a 9.1% relative word error rate reduction (WERR) over our
competitive attention-based sequence-to-sequence model, obviating
the need for second-pass rescoring.

Explore our many areas of focus

Building a collaborative ecosystem

Shaping the future together

Translating discovery into real-world impact

An Analysis of Incorporating an External Language Model into a Sequence-to-Sequence Model

Abstract

Meet the teams driving innovation

Google AI

Google Cloud

Google DeepMind

Google Labs