Google Research

EXPLORING TRADEOFFS IN MODELS FOR LOW-LATENCY SPEECH ENHANCEMENT

Proceedings of the 16th International Workshop on Acoustic Signal Enhancement (2018)

Abstract

We explore a variety of configurations of neural networks for one- and two-channel spectrogram-mask-based speech enhancement. Our best model improves on state-of-the-art performance on the CHiME2 speech enhancement task. We examine trade-offs among non-causal lookahead, compute work, and parameter count versus enhancement performance and find that zero-lookahead models can achieve, on average, only 0.5 dB worse performance than our best bidirectional model. Further, we find that 200 milliseconds of lookahead is sufficient to achieve performance within about 0.2 dB from our best bidirectional model.

Learn more about how we do research

We maintain a portfolio of research projects, providing individuals and teams the freedom to emphasize specific types of work