A Gaussian Mixture Model Layer Jointly Optimized with Discriminative Features within A Deep Neural Network Architecture

Erik McDermott
ICASSP, IEEE (2015)
Google Scholar

Abstract

This article proposes and evaluates a Gaussian Mixture Model
(GMM) represented as the last layer of a Deep Neural Network
(DNN) architecture and jointly optimized with all previous layers
using Asynchronous Stochastic Gradient Descent (ASGD). The resulting “Deep GMM” architecture was investigated with special attention
to the following issues: (1) The extent to which joint optimization
improves over separate optimization of the DNN-based
feature extraction layers and the GMM layer; (2) The extent to which
depth (measured in number of layers, for a matched total number
of parameters) helps a deep generative model based on the GMM
layer, compared to a vanilla DNN model; (3) Head-to-head performance
of Deep GMM architectures vs. equivalent DNN architectures
of comparable depth, using the same optimization criterion
(frame-level Cross Entropy (CE)) and optimization method (ASGD);
(4) Expanded possibilities for modeling offered by the Deep GMM
generative model. The proposed Deep GMMs were found to yield
Word Error Rates (WERs) competitive with state-of-the-art DNN
systems, at the cost of pre-training using standard DNNs to initialize
the Deep GMM feature extraction layers. An extension to Deep
Subspace GMMs is described, resulting in additional gains.