- Brian Patton
- Jan Skoglund
- Jeremy Thorpe
- John Hershey
- Kevin Wilson
- Michael Chinen
- Richard F. Lyon
- Rif A. Saurous
Proceedings of the 16th International Workshop on Acoustic Signal Enhancement (2018)
We explore a variety of configurations of neural networks for one- and two-channel spectrogram-mask-based speech enhancement. Our best model improves on state-of-the-art performance on the CHiME2 speech enhancement task. We examine trade-offs among non-causal lookahead, compute work, and parameter count versus enhancement performance and find that zero-lookahead models can achieve, on average, only 0.5 dB worse performance than our best bidirectional model. Further, we find that 200 milliseconds of lookahead is sufficient to achieve performance within about 0.2 dB from our best bidirectional model.
We maintain a portfolio of research projects, providing individuals and teams the freedom to emphasize specific types of work