SpecAugment: A Simple Augmentation Method for Automatic Speech Recognition

Daniel S. Park; William Chan; Yu Zhang; Chung-Cheng Chiu; Barret Zoph; Ekin Dogus Cubuk; Quoc V. Le

SpecAugment: A Simple Augmentation Method for Automatic Speech Recognition

Daniel S. Park

William Chan

Yu Zhang

Chung-Cheng Chiu

Barret Zoph

Ekin Dogus Cubuk

Quoc V. Le

INTERSPEECH (2019) (to appear)

Google Scholar

Abstract

We present SpecAugment, a simple data augmentation method for speech recognition. SpecAugment is applied directly to the feature inputs of a neural network (i.e., filterbanks). The augmentation policy consists of warping the features, masking blocks of frequencies, and masking blocks of time steps. We apply SpecAugment on Listen, Attend and Spell networks for end-to-end speech recognition tasks. We achieve state-of-the-art performance on the Librispeech 960h and Swichboard 300h tasks, outperforming all prior work. On LibriSpeech, we achieve 6.8% WER on test-other without the use of a language model, and 5.8% WER with language model rescoring. This compares to the previous state-of-the-art hybrid system of 7.5% WER. For Switchboard, we achieve 7.2%/15.4% on the Switchboard/CallHome portion of the Hub5'00 test set without the use of a language model, which compares to the previous state-of-the-art hybrid system at 8.3%/17.3% WER.

Research Areas

Machine intelligence

Explore our many areas of focus

Building a collaborative ecosystem

Shaping the future together

Translating discovery into real-world impact

SpecAugment: A Simple Augmentation Method for Automatic Speech Recognition

Abstract

Research Areas

Meet the teams driving innovation

Google AI

Google Cloud

Google DeepMind

Google Labs