Complex Linear Projection (CLP): A Discriminative Approach to Joint Feature Extraction and Acoustic Modeling
Abstract
State-of-the-art automatic speech recognition (ASR) systems
typically rely on pre-processed features. This paper studies
the time-frequency duality in ASR feature extraction methods
and proposes extending the standard acoustic model with a
complex-valued linear projection layer to learn and optimize
features that minimize standard cost functions such as cross
entropy. The proposed Complex Linear Projection (CLP) features
achieve superior performance compared to pre-processed
Log Mel features.
typically rely on pre-processed features. This paper studies
the time-frequency duality in ASR feature extraction methods
and proposes extending the standard acoustic model with a
complex-valued linear projection layer to learn and optimize
features that minimize standard cost functions such as cross
entropy. The proposed Complex Linear Projection (CLP) features
achieve superior performance compared to pre-processed
Log Mel features.