Trainable Frontend For Robust and Far-Field Keyword Spotting

Yuxuan Wang; Pascal Getreuer; Thad Hughes; Richard F. Lyon; Rif A. Saurous

Trainable Frontend For Robust and Far-Field Keyword Spotting

Yuxuan Wang

Pascal Getreuer

Thad Hughes

Richard F. Lyon

Rif A. Saurous

Proc. IEEE ICASSP 2017, New Orleans, LA

Download Google Scholar

Abstract

Robust and far-field speech recognition is critical to enable true hands-free communication. In far-field conditions, signals are attenuated due to distance. To improve robustness to loudness variation, we introduce a novel frontend called per-channel energy normalization (PCEN). The key ingredient of PCEN is the use of an automatic gain control based dynamic compression to replace the widely used static (such as log or root) compression. We evaluate PCEN on the keyword spotting task. On our large rerecorded noisy and far-field eval sets, we show that PCEN significantly improves recognition performance. Furthermore, we model PCEN as neural network layers and optimize high-dimensional PCEN parameters jointly with the keyword spotting acoustic model. The trained PCEN frontend demonstrates significant further improvements without increasing model complexity or inference-time cost.

Explore our many areas of focus

Building a collaborative ecosystem

Shaping the future together

Translating discovery into real-world impact

Trainable Frontend For Robust and Far-Field Keyword Spotting

Abstract

Meet the teams driving innovation

Google AI

Google Cloud

Google DeepMind

Google Labs