Adaptive Training with Joint Uncertainty Decoding for Robust Recognition of Noise Data

Hank Liao; M.J.F. Gales

Adaptive Training with Joint Uncertainty Decoding for Robust Recognition of Noise Data

Hank Liao

M.J.F. Gales

ICASSP (2007)

Download Google Scholar

Abstract

Standard noise compensation techniques for automatic speech recognition assume a clean trained acoustic model. What is thought of as "clean" data, may still have a variety of speakers, different channels and varying noise conditions. Hence it may be more reasonable to consider such data multi-conditional for multistyle training. This paper shows that multistyle models benefit from VTS compensation or joint uncertainty decoding by reducing the mismatch between training and test. An EM-based noise estimation procedure that produces ML VTS or joint noise models is also described. Alternatively, adaptive training with joint uncertainty transforms factors out the noise from the data. The uncertainty variance bias de-weights observations in the training data where the SNR is low. This property allows data with a wide SNR range to be used and produces canonical models that truly represent clean speech, whereas multistyle trained models must account for all acoustic variation associated with different noise conditions. This paper presents joint adaptive training including formula for estimating the transforms and canonical model parameters. Experiments are conducted on the resource management and broadcast news corpora.

Explore our many areas of focus

Building a collaborative ecosystem

Shaping the future together

Translating discovery into real-world impact

Adaptive Training with Joint Uncertainty Decoding for Robust Recognition of Noise Data

Abstract

Meet the teams driving innovation

Google AI

Google Cloud

Google DeepMind

Google Labs