Hidden Conditional Random Fields for Phone Recognition
Abstract
We apply Hidden Conditional Random Fields
(HCRFs) to the task of TIMIT phone recognition. HCRFs are discriminatively trained sequence models that augment conditional
random fields with hidden states that are capable of representing
subphones and mixture components. We extend HCRFs, which
had previously only been applied to phone classification with
known boundaries, to recognize continuous phone sequences.
We use an N-best inference algorithm in both learning (to
approximate all competitor phone sequences) and decoding (to
marginalize over hidden states). Our monophone HCRFs achieve
28.3% phone error rate, outperforming maximum likelihood
trained HMMs by 3.6%, maximum mutual information trained
HMMs by 2.5%, and minimum phone error trained HMMs
by 2.2%. We show that this win is partially due to HCRFs’
ability to simultaneously optimize discriminative language models
and acoustic models, a powerful property that has important
implications for speech recognition.