Acoustic Modeling in Statistical Parametric Speech Synthesis - From HMM to LSTM-RNN

Heiga Zen

Acoustic Modeling in Statistical Parametric Speech Synthesis - From HMM to LSTM-RNN

Heiga Zen

Proc. MLSLP (2015)

Google Scholar

Abstract

Statistical parametric speech synthesis (SPSS) combines an acoustic model and a vocoder to render speech given a text. Typically decision tree-clustered context-dependent hidden Markov models (HMMs) are employed as the acoustic model, which represent a relationship between linguistic and acoustic features. Recently, artificial neural network-based acoustic models, such as deep neural networks, mixture density networks, and long short-term memory recurrent neural networks (LSTM-RNNs), showed significant improvements over the HMM-based approach. This paper reviews the progress of acoustic modeling in SPSS from the HMM to the LSTM-RNN.

Explore our many areas of focus

Building a collaborative ecosystem

Shaping the future together

Translating discovery into real-world impact

Acoustic Modeling in Statistical Parametric Speech Synthesis - From HMM to LSTM-RNN

Abstract

Meet the teams driving innovation

Google AI

Google Cloud

Google DeepMind

Google Labs