End-to-End Streaming Keyword Spotting

Raziel Alvarez; Hyun Jin Park

End-to-End Streaming Keyword Spotting

Raziel Alvarez

Hyun Jin Park

International Conference on Acoustics, Speech, and Signal Processing (2019) (to appear)

Google Scholar

Abstract

We present a system for keyword spotting that, except for a frontend component for feature generation, it is entirely contained in a deep neural network (DNN) model trained ”end-to-end” to predict the presence of the keyword in a stream of audio. The main contributions of this work are, first, an efficient memoized neural network topology that aims at making better use of the parameters and associated computations in the DNN by holding a memory of previous activations distributed over the depth of the DNN. The second contribution is a method to train the DNN, end-to-end, to produce the keyword spotting score. This system significantly outperforms previous approaches both in terms of quality of detection as well as size and computation.

Research Areas

Machine intelligence

Explore our many areas of focus

Building a collaborative ecosystem

Shaping the future together

Translating discovery into real-world impact

End-to-End Streaming Keyword Spotting

Abstract

Research Areas

Meet the teams driving innovation

Google AI

Google Cloud

Google DeepMind

Google Labs