Statistical Parametric Speech Synthesis Based on Speaker and Language Factorization

Heiga Zen

Norbert Braunschweiler

Sabine Buchholz

Mark J. F. Gales

Kate Knill

Sacha Krstulovic

Javier Latorre

IEEE Transactions on Audio, Speech, and Language Processing, 20(2012), pp. 1713-1724

Download Google Scholar

Abstract

An increasingly common scenario in building speech synthesis and recognition systems is training on inhomogeneous data. This paper proposes a new framework for estimating hidden Markov models on data containing both multiple speakers and multiple languages. The proposed framework, speaker and language factorization, attempts to factorize speaker-/language-specific characteristics in the data and then model them using separate transforms. Language-specific factors in the data are represented by transforms based on cluster mean interpolation with cluster-dependent decision trees. Acoustic variations caused by speaker characteristics are handled by transforms based on constrained maximum-likelihood linear regression. Experimental results on statistical parametric speech synthesis show that the proposed framework enables data from multiple speakers in different languages to be used to: train a synthesis system; synthesize speech in a language using speaker characteristics estimated in a different language; and adapt to a new language.

Defining the technology of today and tomorrow.

Philosophy

People

Teams

AI/ML Foundations  & Capabilities

Algorithms & Optimization

Computing Paradigms

Responsible Human-Centric Technology

Science & Societal Impact

Projects

Publications

Resources

Shaping the future, together.

Student programs

Faculty programs

Conferences & events

Statistical Parametric Speech Synthesis Based on Speaker and Language Factorization

Abstract

Research Areas

Learn more about how we conduct our research

Defining the technology of today and tomorrow.

Philosophy

People

Teams

AI/ML Foundations & Capabilities

Algorithms & Optimization

Computing Paradigms

Responsible Human-Centric Technology

Science & Societal Impact

Projects

Publications

Resources

Shaping the future, together.

Student programs

Faculty programs

Conferences & events

Statistical Parametric Speech Synthesis Based on Speaker and Language Factorization

Abstract

Research Areas

Learn more about how we conduct our research

AI/ML Foundations  & Capabilities