WFST Enabled Solutions to ASR Problems: Beyond HMM Decoding

Björn Hoffmeister
Ralf Schlüter
Hermann Ney
IEEE Transactions on Audio, Speech, and Language Processing, 20(2012), pp. 551-564

Abstract

During the last decade, weighted finite-state transducers (WFSTs) have become popular in speech recognition. While their main field of application remains hidden Markov model (HMM) decoding, the WFST framework is now also seen as a brick in solutions to many other central problems in automatic speech recognition (ASR). These solutions are less known, and this work aims at giving an overview of the applications of WFSTs in large-vocabulary continuous speech recognition (LVCSR) besides HMM decoding: discriminative acoustic model training, Bayes risk decoding, and system combination. The application of the WFST framework has a big practical impact: we show how the framework helps to structure problems, to develop generic solutions, and to delegate complex computations to WFST toolkits. In this paper, we review the literature, discuss existing approaches, and provide new insights into WFST enabled solutions. We also present a novel, purely WFST-based algorithm for computing the exact Bayes risk hypothesis from a lattice with the Levenshtein distance as loss function. We present the problems and their solutions in a unified framework and discuss the advantages and limits of using WFSTs. We do not provide new experimental results, but refer to the existing literature. Our work helps to identify where and how the transducer framework can contribute to a compact and generic solution to LVCSR problems.

Research Areas