Handling Background Noise in Neural Speech Generation

Tom Denton
Alejandro Luebs
Andrew Storus
Hengchin Ye
W. Bastiaan Kleijn
2020 Asilomar Conference on Signals, Systems, and Computers (2021)
Google Scholar

Abstract

Recent advances in neural-network based generative modeling of speech has shown great potential for speech coding. However, the performance of such models drops when the input is not clean speech, e.g., in the presence of background noise, preventing its use in practical applications. In this paper we examine the reason and discuss methods to overcome this issue. Placing a denoising preprocessing stage when extracting features and target clean speech during training is shown to be the best performing strategy.