Abstract
Traditional evaluation of language models (LM) for automatic speech recognition (ASR) uses either the information theoretic -motivated perplexity (PPL) or the word error rate (WER) — measured by plugging the model in a speech recognizer.
It is a well known fact that WER and PPL and poorly correlated. The main reason is probably
the fact that PPL measures the predictive power of the LM on correct text, whereas at recognition time the LM needs to discriminate between alternates suggested by the acoustic model used in the recognizer. Since the LM is estimated using maximum-likelihood methods on correct (well-formed) sentences, it is poorly suited for discriminating among the candidates proposed by the acoustic model as likely candidates.
We propose a new evaluation metric for LMs that takes into account the coupling between language model and acoustic model in a given ASR system. The new metric, “acoustic model -sensitive” perplexity (AMS-PPL), aims at allowing one to optimize the LM parameters such that it performs best when used with a given acoustic model. The underlying main idea is to estimate the conditional cross-entropy H(W|A) for the correct word sequence W when the acoustic signal to be decoded was A.