Incorporating Written Domain Numeric Grammars Into End-to-End Contextual Speech Recognition Systems For Improved Recognition of Numeric Sequences

Ben Haynor
2019 IEEE Automatic Speech Recognition and Understanding Workshop (2020)
Google Scholar

Abstract

Accurate recognition of numeric sequences is crucial for a number of contextual speech
recognition applications. For example, a user might create a calendar
event and be prompted by a virtual assistant for the time, date, and
duration of the event. We propose using finite state
transducers built from written domain numeric grammars, to increase the
likelihood of hypotheses matching these grammars during beam search in an end-to-end speech
recognition system.

Using our technique results in
significant reduction of word error rates (up to 59\%) on a variety of numeric
sequence recognition tasks (times, percentages, digit sequences).

Research Areas