Incorporating Written Domain Numeric Grammars Into End-to-End Contextual Speech Recognition Systems For Improved Recognition of Numeric Sequences

2019 IEEE Automatic Speech Recognition and Understanding Workshop (2020)


Accurate recognition of numeric sequences is crucial for a number of contextual speech recognition applications. For example, a user might create a calendar event and be prompted by a virtual assistant for the time, date, and duration of the event. We propose using finite state transducers built from written domain numeric grammars, to increase the likelihood of hypotheses matching these grammars during beam search in an end-to-end speech recognition system.

Using our technique results in significant reduction of word error rates (up to 59\%) on a variety of numeric sequence recognition tasks (times, percentages, digit sequences).

