Deep Learning for Acoustic Modeling in Parametric Speech Generation: A systematic review of existing techniques and future trends

Zhen-Hua Ling; Shiyin Kang; Heiga Zen; Andrew Senior; Mike Schuster; Xiao-Jun Qian; Helen Meng; Li Deng

Deep Learning for Acoustic Modeling in Parametric Speech Generation: A systematic review of existing techniques and future trends

Zhen-Hua Ling

Shiyin Kang

Heiga Zen

Andrew Senior

Mike Schuster

Xiao-Jun Qian

Helen Meng

Li Deng

IEEE Signal Processing Magazine, 32 (2015), pp. 35-52

Download Google Scholar

Abstract

Hidden Markov models (HMMs) and Gaussian mixture models (GMMs) are the two most common types of acoustic models used in statistical parametric approaches for generating low-level speech waveforms from high-level symbolic inputs via intermediate acoustic feature sequences. However, these models have their limitations in representing complex, nonlinear relationships between the speech generation inputs and the acoustic features. Inspired by the intrinsically hierarchical process of human speech production and by the successful application of deep neural networks (DNNs) to automatic speech recognition (ASR), deep learning techniques have also been applied successfully to speech generation, as reported in recent literature.

Explore our many areas of focus

Building a collaborative ecosystem

Shaping the future together

Translating discovery into real-world impact

Deep Learning for Acoustic Modeling in Parametric Speech Generation: A systematic review of existing techniques and future trends

Abstract

Meet the teams driving innovation

Google AI

Google Cloud

Google DeepMind

Google Labs