Disentangling Correlated Speaker and Noise for Speech Synthesis via Data Augmentation and Adversarial Factorization

Wei-Ning Hsu; Yu Zhang; Ron J. Weiss; Yu-An Chung; Yuxuan Wang; Yonghui Wu; James Glass

Disentangling Correlated Speaker and Noise for Speech Synthesis via Data Augmentation and Adversarial Factorization

Wei-Ning Hsu

Yu Zhang

Ron J. Weiss

Yu-An Chung

Yuxuan Wang

Yonghui Wu

James Glass

ICASSP (2019)

Download Google Scholar

Abstract

To leverage crowd-sourced data to train multi-speaker text-to-speech (TTS) models
that can synthesize clean speech for all speakers, it is essential to learn disentangled
representations which can independently control the speaker identity and background noise in generated signals. However, learning such representations can be
challenging, due to the lack of labels describing the recording conditions of each
training example, and the fact that speakers and recording conditions are often
correlated, e.g. since users often make many recordings using the same equipment.
This paper proposes three components to address this problem by: (1) formulating
a conditional generative model with factorized latent variables, (2) using data
augmentation to add noise that is not correlated with speaker identity and whose
label is known during training, and (3) using adversarial factorization to improve
disentanglement. Experimental results demonstrate that the proposed method can
disentangle speaker and noise attributes even if they are correlated in the training
data, and can be used to consistently synthesize clean speech for all speakers.
Ablation studies verify the importance of each proposed component.

Research Areas

Machine intelligence

Explore our many areas of focus

Building a collaborative ecosystem

Shaping the future together

Translating discovery into real-world impact

Disentangling Correlated Speaker and Noise for Speech Synthesis via Data Augmentation and Adversarial Factorization

Abstract

Research Areas

Meet the teams driving innovation

Google AI

Google Cloud

Google DeepMind

Google Labs