Joint noise adaptive training for robust automatic speech recognition

DeLiang Wang
Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), IEEE(2014), pp. 2523-2527

Abstract

We explore time-frequency masking to improve noise robust automatic speech recognition. Apart from its use as a frontend, we use it for providing smooth estimates of speech and noise which are then passed as additional features to a deep neural network (DNN) based acoustic model. Such a system improves performance on the Aurora-4 dataset by 10.5% (relative) compared to the previous best published results. By formulating separation as a supervised mask estimation problem, we develop a unified DNN framework that jointly improves separation and acoustic modeling. Our final system outperforms the previous best system on CHiME-2 corpus by 22.1% (relative).

Research Areas