StrawNet: Self-Training WaveNet for TTS in Low-Data Regimes

Manish Kumar Sharma; Tom Kenter; Rob Clark

StrawNet: Self-Training WaveNet for TTS in Low-Data Regimes

Manish Kumar Sharma

Tom Kenter

Rob Clark

(2020) (to appear)

Download Google Scholar

Abstract

Recently, WaveNet has become a popular choice of neural network to synthesize speech audio. Autoregressive WaveNet is capable of producing high-fidelity audio, but is too slow for real-time synthesis. As a remedy, Parallel WaveNet was proposed, which can produce audio faster than real time through distillation of an autoregressive teacher into a feedforward student network. A shortcoming of this approach, however, is that a large amount of recorded speech data is required to produce high-quality student models, and this data is not always available. In this paper, we propose StrawNet: a self-training approach to train a Parallel WaveNet. Self-training is performed using the synthetic examples generated by the autoregressive WaveNet teacher. We show that, in low-data regimes, training on high-fidelity synthetic data from an autoregressive teacher model is superior to training the student model on (much fewer) examples of recorded speech. We compare StrawNet to a baseline Parallel WaveNet, using both side-by-side tests and Mean Opinion Score evaluations. To our knowledge, synthetic speech has not been used to train neural text-to-speech before.

Research Areas

Machine intelligence

Explore our many areas of focus

Building a collaborative ecosystem

Shaping the future together

Translating discovery into real-world impact

StrawNet: Self-Training WaveNet for TTS in Low-Data Regimes

Abstract

Research Areas

Meet the teams driving innovation

Google AI

Google Cloud

Google DeepMind

Google Labs