Entropy Based Pruning of Backoff MaxEnt Language Models with Contextual Features
Abstract
In this paper, we present a pruning technique for maximum entropy (MaxEnt)
language models. It is based on computing the exact entropy loss when removing
each feature from the model, and it explicitly supports backoff features by
replacing each removed feature with its backoff.
The algorithm computes the loss on the training data, so it is not restricted to
models with n-gram like features, allowing models with any feature, including
long range skips, triggers, and contextual features such as device location.
Results on the 1-billion word corpus show large perplexity improvements
relative for frequency pruned models of comparable size.
Automatic speech recognition (ASR) experiments show up to 0.2\% absolute WER
improvements in a large-scale cloud based mobile ASR system for Italian.
language models. It is based on computing the exact entropy loss when removing
each feature from the model, and it explicitly supports backoff features by
replacing each removed feature with its backoff.
The algorithm computes the loss on the training data, so it is not restricted to
models with n-gram like features, allowing models with any feature, including
long range skips, triggers, and contextual features such as device location.
Results on the 1-billion word corpus show large perplexity improvements
relative for frequency pruned models of comparable size.
Automatic speech recognition (ASR) experiments show up to 0.2\% absolute WER
improvements in a large-scale cloud based mobile ASR system for Italian.