- Assaf Michaely
- Carolina Parada
- Frank Zhang
- Gabor Simko
- Petar Aleksic
Abstract
We present a novel approach for improving overall quality of keyword spotting using contextual automatic speech recognition (ASR) system. On voice-activated devices with limited resources, it is common that a keyword spotting system is run on the device in order to detect a trigger phrase (e.g. “ok google”) and decide which audio should be sent to the server (to be transcribed by the ASR system and processed to generate a response to the user). Due to limited resources on a device, the device keyword spotting system might introduce false accepts (FAs) and false rejects (FRs) that can cause a negative user experience. We describe a system that uses server-side contextual ASR and dynamic classes for improved keyword spotting. We show that this method can significantly reduce FA rates (by 89%) while minimally increasing FR rate (0.15%). Furthermore, we show that this system helps reduce Word Error Rate (WER) (by 10% to 50% relative, on different test sets) and allows users to speak seamlessly, without pausing between the trigger phrase and the command.
Research Areas
Learn more about how we do research
We maintain a portfolio of research projects, providing individuals and teams the freedom to emphasize specific types of work