WaveGrad 2: Iterative Refinement for Text-to-Speech Synthesis

Nanxin Chen

Yu Zhang

Heiga Zen (Byungha Chun)

Ron J. Weiss

Mohammad Norouzi

Najim Dehak

William Chan

Interspeech (2021)

Download Google Scholar

Abstract

This paper introduces WaveGrad 2, an end-to-end non-autoregressive generative model for text-to-speech synthesis trained to estimate the gradients of the data density. Unlike recent TTS systems which are a cascade of separately learned models, during training the proposed model requires only text or phoneme sequence, learns all parameters end-to-end without intermediate features, and can generate natural speech audio with great varieties. This is achieved by the score matching objective, which optimizes the network to model the score function of the real data distribution. Output waveforms are generated using an iterative refinement process beginning from a random noise sample. Like our prior work, WaveGrad 2 offers a natural way to trade inference speed for sample quality by adjusting the number of refinement steps. Experiments reveal that the model can generate high fidelity audio, closing the gap between end-to-end and contemporary systems, approaching the performance of a state-of-the-art neural TTS system. We further carry out various ablations to study the impact of different model configurations.

Defining the technology of today and tomorrow.

Philosophy

People

Teams

AI/ML Foundations  & Capabilities

Algorithms & Optimization

Computing Paradigms

Responsible Human-Centric Technology

Science & Societal Impact

Projects

Publications

Resources

Shaping the future, together.

Student programs

Faculty programs

Conferences & events

WaveGrad 2: Iterative Refinement for Text-to-Speech Synthesis

Abstract

Research Areas

Learn more about how we conduct our research

Defining the technology of today and tomorrow.

Philosophy

People

Teams

AI/ML Foundations & Capabilities

Algorithms & Optimization

Computing Paradigms

Responsible Human-Centric Technology

Science & Societal Impact

Projects

Publications

Resources

Shaping the future, together.

Student programs

Faculty programs

Conferences & events

WaveGrad 2: Iterative Refinement for Text-to-Speech Synthesis

Abstract

Research Areas

Learn more about how we conduct our research

AI/ML Foundations  & Capabilities