EXPLORING TRADEOFFS IN MODELS FOR LOW-LATENCY SPEECH ENHANCEMENT

Brian Patton; Jan Skoglund; Jeremy Thorpe; John Hershey; Kevin Wilson; Michael Chinen; Richard F. Lyon; Rif A. Saurous

EXPLORING TRADEOFFS IN MODELS FOR LOW-LATENCY SPEECH ENHANCEMENT

Brian Patton

Jan Skoglund

Jeremy Thorpe

John Hershey

Kevin Wilson

Michael Chinen

Richard F. Lyon

Rif A. Saurous

Proceedings of the 16th International Workshop on Acoustic Signal Enhancement (2018)

Download Google Scholar

Abstract

We explore a variety of configurations of neural networks for one- and
two-channel spectrogram-mask-based speech enhancement. Our best model improves on
state-of-the-art performance on the CHiME2 speech enhancement task.
We examine trade-offs among non-causal lookahead, compute work, and parameter count versus enhancement performance and find that zero-lookahead models can achieve, on average, only 0.5 dB worse performance than our best bidirectional model. Further, we find that 200 milliseconds of lookahead is sufficient to achieve performance within about 0.2 dB from our best bidirectional model.

Research Areas

Machine perception

Explore our many areas of focus

Building a collaborative ecosystem

Shaping the future together

Translating discovery into real-world impact

EXPLORING TRADEOFFS IN MODELS FOR LOW-LATENCY SPEECH ENHANCEMENT

Abstract

Research Areas

Meet the teams driving innovation

Google AI

Google Cloud

Google DeepMind

Google Labs