Incorporating Written Domain Numeric Grammars Into End-to-End Contextual Speech Recognition Systems For Improved Recognition of Numeric Sequences

Ben Haynor; Petar S. Aleksic

Incorporating Written Domain Numeric Grammars Into End-to-End Contextual Speech Recognition Systems For Improved Recognition of Numeric Sequences

Ben Haynor

Petar S. Aleksic

2019 IEEE Automatic Speech Recognition and Understanding Workshop (2020)

Google Scholar

Abstract

Accurate recognition of numeric sequences is crucial for a number of contextual speech
recognition applications. For example, a user might create a calendar
event and be prompted by a virtual assistant for the time, date, and
duration of the event. We propose using finite state
transducers built from written domain numeric grammars, to increase the
likelihood of hypotheses matching these grammars during beam search in an end-to-end speech
recognition system.

Using our technique results in
significant reduction of word error rates (up to 59\%) on a variety of numeric
sequence recognition tasks (times, percentages, digit sequences).

Explore our many areas of focus

Building a collaborative ecosystem

Shaping the future together

Translating discovery into real-world impact

Incorporating Written Domain Numeric Grammars Into End-to-End Contextual Speech Recognition Systems For Improved Recognition of Numeric Sequences

Abstract

Meet the teams driving innovation

Google AI

Google Cloud

Google DeepMind

Google Labs