Improved language model adaptation using existing and derived external resources

Lin-Shan Lee
IEEE Workshop on Automatic Speech Recognition and Understanding, U.S. Virgin Islands(2003)


Adaptation of language models to obtain better parameters for the topics addressed by the spoken documents to be recognized has been a key issue for speech recognition. In this paper, we propose to collect existing as well as derived external resources for improved language model adaptation. The derived external resources are those retrieved based on the baseline transcriptions for the input spoken documents from the Internet using some search engine. The design of queries for such purposes are also analyzed in this paper, in which the special structure of Chinese language is considered. The obtained existing and derived external resources are then used in the model adaptation under a Clustering-Classification framework. Very encouraging results were obtained in the preliminary experiments with two test sets: broadcast news and interview recording.

Research Areas