Keyword Spotting for Google Assistant Using Contextual Speech Recognition

Frank Zhang
Gabor Simko
ASRU 2017, IEEE

Abstract

We present a novel approach for improving overall quality of
keyword spotting using contextual automatic speech recognition
(ASR) system. On voice-activated devices with limited resources,
it is common that a keyword spotting system is run on
the device in order to detect a trigger phrase (e.g. “ok google”)
and decide which audio should be sent to the server (to be transcribed
by the ASR system and processed to generate a response
to the user). Due to limited resources on a device, the device
keyword spotting system might introduce false accepts (FAs)
and false rejects (FRs) that can cause a negative user experience.
We describe a system that uses server-side contextual ASR and
dynamic classes for improved keyword spotting. We show that
this method can significantly reduce FA rates (by 89%) while
minimally increasing FR rate (0.15%). Furthermore, we show
that this system helps reduce Word Error Rate (WER) (by 10%
to 50% relative, on different test sets) and allows users to speak
seamlessly, without pausing between the trigger phrase and the
command.

Research Areas