Exploring speech enhancement with generative adversarial networks for robust speech recognition

Chris Donahue

Bo Li

Rohit Prabhavalkar

Proc. ICASSP (2018)

Google Scholar

Abstract

We investigate the effectiveness of generative adversarial
networks (GANs) for speech enhancement, in the context of
improving noise robustness of automatic speech recognition
(ASR) systems. Prior work demonstrates that GANs can
effectively suppress additive noise in raw waveform speech
signals, improving perceptual quality metrics; however this
technique was not justified in the context of ASR. In this
work, we conduct a detailed study to measure the effectiveness
of GANs in enhancing speech contaminated by both
additive and reverberant noise. Motivated by recent advances
in image processing, we propose operating GANs on
log-Mel filterbank spectra instead of waveforms, which requires
less computation and is more robust to reverberant
noise. While GAN enhancement improves the performance
of an out-of-box ASR system on noisy speech, it falls short
of the performance achieved by conventional multi-style
training (MTR). By appending the GAN-enhanced features
to the noisy inputs and retraining, we achieve a 7% WER
improvement relative to the MTR system.

Explore our many areas of focus

Building a collaborative ecosystem

Shaping the future together

Translating discovery into real-world impact

Exploring speech enhancement with generative adversarial networks for robust speech recognition

Abstract

Meet the teams driving innovation

Google AI

Google Cloud

Google DeepMind

Google Labs