Jump to Content
Michael Riley

Michael Riley

Michael Riley has a B.S., M.S., and Ph.D. from MIT, all in computer science. He began his career at Bell Labs and AT&T Labs where he, together with Mehryar Mohri and Fernando Pereira, introduced and developed the theory and use of weighted finite-state transducers (WFSTs) in speech and language. He is currently distinguished research scientist at Google, Inc. His interests include speech and natural language processing, machine learning, and information retrieval. He is a principal author of the OpenFst library He manages a group with expertise that includes speech recognition and synthesis, NLP, information retrieval, image processing, algorithms, machine learning and privacy. He is an IEEE and ISCA Fellow.
Authored Publications
Google Publications
Other Publications
Sort By
  • Title
  • Title, descending
  • Year
  • Year, descending
    Preview abstract This paper explores ways to improve a two-pass speech recognition system when the first-pass is hybrid autoregressive transducer model and the second-pass is a neural language model. The main focus is on the scores provided by each of these models, their quantitative analysis, how to improve them and the best way to integrate them with the objective of better recognition accuracy. Several analysis are presented to show the importance of the choice of the integration weights for combining the first-pass and the second-pass scores. A sequence level weight estimation model along with four training criteria are proposed which allow adaptive integration of the scores per acoustic sequence. The effectiveness of this algorithm is demonstrated by constructing and analyzing models on the Librispeech data set. View details
    Preview abstract We introduce a framework for adapting a virtual keyboard to individual user behavior by modifying a Gaussian spatial model to use personalized key center offset means and, optionally, learned covariances. Through numerous real-world studies, we determine the importance of training data quantity and weights, as well as the number of clusters into which to group keys to avoid overfitting. While past research has shown potential of this technique using artificially-simple virtual keyboards and games or fixed typing prompts, we demonstrate effectiveness using the highly-tuned Gboard app with a representative set of users and their real typing behaviors. Across a variety of top languages,we achieve small-but-significant improvements in both typing speed and decoder accuracy. View details
    Preview abstract Weighted finite automata (WFA) are often used to represent probabilistic models, such as n- gram language models, since they are efficient for recognition tasks in time and space. The probabilistic source to be represented as a WFA, however, may come in many forms. Given a generic probabilistic model over sequences, we propose an algorithm to approximate it as a weighted finite automaton such that the Kullback-Leiber divergence between the source model and the WFA target model is minimized. The proposed algorithm involves a counting step and a difference of convex optimization step, both of which can be performed efficiently. We demonstrate the usefulness of our approach on various tasks, including distilling n-gram models from neural models, building compact language models, and building open-vocabulary character models. The algorithms used for these experiments are available in an open-source software library. View details
    Preview abstract On-device end-to-end (E2E) models have shown improvementsover a conventional model on Search test sets in both quality, as measured by Word Error Rate (WER), and latency, measured by the time the result is finalized after the user stops speaking. However, the E2E model is trained on a small fraction of audio-text pairs compared to the 100 billion text utterances that a conventional language model (LM) is trained with. Thus E2E models perform poorly on rare words and phrases. In this paper, building upon the two-pass streaming Cascaded Encoder E2E model, we explore using a Hybrid Autoregressive Transducer (HAT) factorization to better integrate an on-device neural LM trained on text-only data. Furthermore, to further improve decoder latency we introduce a non-recurrent embedding decoder, in place of the typical LSTM decoder, into the Cascaded Encoder model. Overall, we present a streaming on-device model that incorporates an external neural LM and outperforms the conventional model in both search and rare-word quality, as well as latency, and is 318X smaller. View details
    Hybrid Autoregressive Transducer (HAT)
    ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing, Barcelona, Spain, pp. 6139-6143
    Preview abstract This paper proposes and evaluates the hybrid autoregressive transducer (HAT) model, a time-synchronous encoder-decoder model that preserves the modularity of conventional automatic speech recognition systems. The HAT model provides a way to measure the quality of the internal language model that can be used to decide whether inference with an external language model is beneficial or not. We evaluate our proposed model on a large-scale voice search task. Our experiments show significant improvements in WER compared to the state-of-the-art approaches. View details
    Preview abstract Weighted finite automata (WFA) are often used to represent probabilistic models, such as n-gram language models, since they are efficient for recognition tasks in time and space. The probabilistic source to be represented as a WFA, however, may come in many forms. Given a generic probabilistic model over sequences, we propose an algorithm to approximate it as a weighted finite automaton such that the Kullback-Leibler divergence between the source model and the WFA target model is minimized. The proposed algorithm involves a counting step and a difference of convex optimization, both of which can be performed efficiently. We demonstrate the usefulness of our approach on some tasks including distilling n-gram models from neural models. View details
    Preview abstract We propose algorithms to train production-quality n-gram language models using federated learning. Federated learning is a machine learning technique to train global models to be used on portable devices such as smart phones, without the users' data ever leaving their devices. This is especially relevant for applications handling privacy-sensitive data, such as virtual keyboards. While the principles of federated learning are fairly generic, its methodology assumes that the underlying models are neural networks. However, virtual keyboards are typically powered by n-gram language models, mostly for latency reasons. We propose to train a recurrent neural network language model using the decentralized "FederatedAveraging" algorithm directly on training and to approximating this federated model server-side with an n-gram model that can be deployed to devices for fast inference. Our technical contributions include novel ways of handling large vocabularies, algorithms to correct capitalization errors in user data, and efficient finite state transducer algorithms to convert word language models to word-piece language models and vice versa. The n-gram language models trained with federated learning are compared to n-grams trained with traditional server-based algorithms using A/B tests on tens of millions of users of a virtual keyboard. Results are presented for two languages, American English and Brazilian Portuguese. This work demonstrates that high-quality n-gram language models can be trained directly on client mobile devices without sensitive training data ever leaving the device. View details
    Latin script keyboards for South Asian languages with finite-state normalization
    Lawrence Wolf-Sonkin
    Vlad Schogol
    Proceedings of FSMNLP (2019), pp. 108-117
    Preview abstract The use of the Latin script for text entry of South Asian languages is common, even though there is no standard orthography for these languages in the script. We explore several compact finite-state architectures that permit variable spellings of words during mobile text entry. We find that approaches making use of transliteration transducers provide large accuracy improvements over baselines, but that simpler approaches involving a compact representation of many attested alternatives yields much of the accuracy gain. This is particularly important when operating under constraints on model size (e.g., on inexpensive mobile devices with limited storage and memory for keyboard models), and on speed of inference, since people typing on mobile keyboards expect no perceptual delay in keyboard responsiveness. View details
    Algorithms for Weighted Finite Automata with Failure Transitions
    International Conference of Implementation and Applications of Automata (CIAA) (2018), pp. 46-58
    Preview abstract In this paper we extend some key weighted finite automata (WFA) algorithms to automata with failure transitions (phi-WFAs). Failure transitions, which are taken only when no immediate\ match is possible at a given state, are used to compactly epresent automata and have many applications. An efficient intersection algorithm and a shortest distance algorithm (over R+) are presented as well as a related algorithm to remove failure transitions from a phi-WFA. View details
    Preview abstract Recent interest in intelligent assistants has increased demand for Automatic Speech Recognition (ASR) systems that can utilize contextual information to adapt to the user’s preferences or the current device state. For example, a user might be more likely to refer to their favorite songs when giving a “music playing” command or request to watch a movie starring a particular favorite actor when giving a “movie playing” command. Similarly, when a device is in a “music playing” state, a user is more likely to give volume control commands. In this paper, we explore using semantic information inside the ASR word lattice by employing Named Entity Recognition (NER) to identify and boost contextually relevant paths in order to improve speech recognition accuracy. We use broad semantic classes comprising millions of entities, such as songs and musical artists, to tag relevant semantic entities in the lattice. We show that our method reduces Word Error Rate (WER) by 12.0% relative on a Google Assistant “media playing” commands test set, while not affecting WER on a test set containing commands unrelated to media. View details
    Transliterated mobile keyboard input via weighted finite-state transducers
    Lars Hellsten
    Prasoon Goyal
    Proceedings of the 13th International Conference on Finite State Methods and Natural Language Processing (FSMNLP) (2017)
    Preview abstract We present an extension to a mobile keyboard input decoder based on finite-state transducers that provides general transliteration support, and demonstrate its use for input of South Asian languages using a QWERTY keyboard. On-device keyboard decoders must operate under strict latency and memory constraints, and we present several transducer optimizations that allow for high accuracy decoding under such constraints. Our methods yield substantial accuracy improvements and latency reductions over an existing baseline transliteration keyboard approach. The resulting system was launched for 22 languages in Google Gboard in the first half of 2017. View details
    On Lattice Generation for Large Vocabulary Speech Recognition
    Johan Schalkwyk
    IEEE Automatic Speech Recognition and Understanding Workshop (ASRU), Okinawa, Japan (2017)
    Preview abstract Lattice generation is an essential feature of the decoder for many speech recognition applications. In this paper, we first review lattice generation methods for WFST-based decoding and describe in a uniform formalism two established approaches for state-of-the-art speech recognition systems: the phone pair and the N-best histories approaches. We then present a novel optimization method, pruned determinization followed by minimization, that produces a deterministic minimal lattice that retains all paths within specified weight and lattice size thresholds. Experimentally, we show that before optimization, the phone-pair and the N-best histories approaches each have conditions where they perform better when evaluated on video transcription and mixed voice search and dictation tasks. However, once this lattice optimization procedure is applied, the phone pair approach has the lowest oracle WER for a given lattice density by a significant margin. We further show that the pruned determinization presented here is efficient to use during decoding unlike classical weighted determinization from which it is derived. Finally, we consider on-the-fly lattice rescoring in which the lattice generation and combination with the secondary LM are done in one step. We compare the phone pair and N-best histories approaches for this scenario and find the former superior in our experiments. View details
    Preview abstract We present a new algorithm for efficiently training n-gram language models on uncertain data, and illustrate its use for semi-supervised language model adaptation. We compute the probability that an n-gram occurs k times in the sample of uncertain data, and use the resulting histograms to derive a generalized Katz backoff model. We compare semi-supervised adaptation of language models for YouTube video speech recognition in two conditions: when using full lattices with our new algorithm versus just the 1-best output from the baseline speech recognizer. Unlike 1-best methods, the new algorithm provides models that yield solid improvements over the baseline on the full test set, and, further, achieves these gains without hurting performance on any of the set of channels. We show that channels with the most data yielded the largest gains. The algorithm was implemented via a new semiring in the OpenFst library and will be released as part of the OpenGrm ngram library. View details
    Distributed representation and estimation of WFST-based n-gram models
    Proceedings of the ACL Workshop on Statistical NLP and Weighted Automata (StatFSM) (2016), pp. 32-41
    Preview abstract We present methods for partitioning a weighted finite-state transducer (WFST) representation of an n-gram language model into multiple shards, each of which is a stand-alone WFST n-gram model in its own right, allowing processing with existing algorithms. After independent estimation, including normalization, smoothing and pruning on each shard, the shards can be merged into a single WFST that is identical to the model that would have resulted from estimation without sharding. We then present an approach that uses data partitions in conjunction with WFST sharding to estimate models on orders-of-magnitude more data than would have otherwise been feasible with a single process. We present some numbers on shard characteristics when large models are trained from a very large data set. Functionality to support distributed n-gram modeling has been added to the OpenGrm library. View details
    Contextual prediction models for speech recognition
    Yoni Halpern
    Keith Hall
    Vlad Schogol
    Martin Baeuml
    Proceedings of Interspeech 2016
    Preview abstract We introduce an approach to biasing language models towards known contexts without requiring separate language models or explicit contextually-dependent conditioning contexts. We do so by presenting an alternative ASR objective, where we predict the acoustics and words given the contextual cue, such as the geographic location of the speaker. A simple factoring of the model results in an additional biasing term, which effectively indicates how correlated a hypothesis is with the contextual cue (e.g., given the hypothesized transcript, how likely is the user’s known location). We demonstrate that this factorization allows us to train relatively small contextual models which are effective in speech recognition. An experimental analysis shows both a perplexity reduction and a significant word error rate reductions on a voice search task when using the user’s location as a contextual cue. View details
    Composition-based on-the-fly rescoring for salient n-gram biasing
    Keith Hall
    Eunjoon Cho
    Noah Coccaro
    Kaisuke Nakajima
    Linda Zhang
    Interspeech 2015, International Speech Communications Association
    Preview
    Preview abstract This paper describes a new method for building compact context-dependency transducers for finite-state transducer-based ASR decoders. Instead of the conventional phonetic decision tree growing followed by FST compilation, this approach incorporates the phonetic context splitting directly into the transducer construction. The objective function of the split optimization is augmented with a regularization term that measures the number of transducer states introduced by a split. We give results on a large spoken-query task for various n-phone orders and other phonetic features that show this method can greatly reduce the size of the resulting context-dependency transducer with no significant impact on recognition accuracy. This permits using context sizes and features that might otherwise be unmanageable. View details
    Pushdown automata in statistical machine translation
    Bill Byrne
    Adrià de Gispert
    Gonzalo Iglesias
    Computational Linguistics, vol. 40 (2014), pp. 687-723
    Preview
    Smoothed marginal distribution constraints for language modeling
    Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (ACL) (2013), pp. 43-52
    Preview abstract We present an algorithm for re-estimating parameters of backoff n-gram language models so as to preserve given marginal distributions, along the lines of well-known Kneser-Ney smoothing. Unlike Kneser-Ney, our approach is designed to be applied to any given smoothed backoff model, including models that have already been heavily pruned. As a result, the algorithm avoids issues observed when pruning Kneser-Ney models (Siivola et al., 2007; Chelba et al., 2010), while retaining the benefits of such marginal distribution constraints. We present experimental results for heavily pruned backoff n-gram models, and demonstrate perplexity and word error rate reductions when used with various baseline smoothing methods. An open-source version of the algorithm has been released as part of the OpenGrm ngram library. View details
    Mobile Music Modeling, Analysis and Recognition
    Pavel Golik
    Boulos Harb
    Alex Rudnick
    International Conference on Acoustics, Speech, and Signal Processing (ICASSP) (2012)
    Preview abstract We present an analysis of music modeling and recognition techniques in the context of mobile music matching, substantially improving on the techniques presented in [Mohri et al., 2010]. We accomplish this by adapting the features specifically to this task, and by introducing new modeling techniques that enable using a corpus of noisy and channel-distorted data to improve mobile music recognition quality. We report the results of an extensive empirical investigation of the system's robustness under realistic channel effects and distortions. We show an improvement of recognition accuracy by explicit duration modeling of music phonemes and by integrating the expected noise environment into the training process. Finally, we propose the use of frame-to-phoneme alignment for high-level structure analysis of polyphonic music. View details
    Language Modeling for Automatic Speech Recognition Meets the Web: Google Search by Voice
    Johan Schalkwyk
    Boulos Harb
    Peng Xu
    Preethi Jyothi
    Thorsten Brants
    Vida Ha
    Will Neveitt
    University of Toronto (2012)
    Preview abstract A critical component of a speech recognition system targeting web search is the language model. The talk presents an empirical exploration of the google.com query stream with the end goal of high quality statistical language modeling for mobile voice search. Our experiments show that after text normalization the query stream is not as ``wild'' as it seems at first sight. One can achieve out-of-vocabulary rates below 1% using a one million word vocabulary, and excellent n-gram hit ratios of 77/88% even at high orders such as n=5/4, respectively. Using large scale, distributed language models can improve performance significantly---up to 10\% relative reductions in word-error-rate over conventional models used in speech recognition. We also find that the query stream is non-stationary, which means that adding more past training data beyond a certain point provides diminishing returns, and may even degrade performance slightly. Perhaps less surprisingly, we have shown that locale matters significantly for English query data across USA, Great Britain and Australia. In an attempt to leverage the speech data in voice search logs, we successfully build large-scale discriminative N-gram language models and derive small but significant gains in recognition performance. View details
    Hierarchical Phrase-Based Translation Representations
    Gonzalo Iglesias
    William Byrne
    Adrià de Gispert
    Proceedings of EMNLP 2011
    Preview
    Language Modeling for Automatic Speech Recognition Meets the Web: Google Search by Voice
    Johan Schalkwyk
    Boulos Harb
    Peng Xu
    Thorsten Brants
    Vida Ha
    Will Neveitt
    OGI/OHSU Seminar Series, Portland, Oregon, USA (2011)
    Preview abstract The talk presents key aspects faced when building language models (LM) for the google.com query stream, and their use for automatic speech recognition (ASR). Distributed LM tools enable us to handle a huge amount of data, and experiment with LMs that are two orders of magnitude larger than usual. An empirical exploration of the problem led us to re-discovering a less known interaction between Kneser-Ney smoothing and entropy pruning, possible non-stationarity of the query stream, as well as strong dependence on various English locales---USA, Britain and Australia. LM compression techniques allowed us to use one billion n-gram LMs in the first pass of an ASR system built on FST technology, and evaluate empirically whether a two-pass system architecture has any losses over one pass. View details
    A Filter-based Algorithm for Efficient Composition of Finite-State Transducers
    Johan Schalkwyk
    International Journal of Foundations of Computer Science, vol. 22 (2011), pp. 1781-1795
    Preview
    Preview abstract This paper explores various static interpolation methods for approximating a single dynamically-interpolated language model used for a variety of recognition tasks on the Google Android platform. The goal is to find the statically-interpolated firstpass LM that best reduces search errors in a two-pass system or that even allows eliminating the more complex dynamic second pass entirely. Static interpolation weights that are uniform, prior-weighted, and the maximum likelihood, maximum a posteriori, and Bayesian solutions are considered. Analysis argues and recognition experiments on Android test data show that a Bayesian interpolation approach performs best. View details
    Preview abstract This paper describes a new method for building compact con-text-dependency transducers for finite-state transducer-based ASR decoders. Instead of the conventional phonetic decision-tree growing followed by FST compilation, this approach incorporates the phonetic context splitting directly into the transducer construction. The objective function of the split optimization is augmented with a regularization term that measures the number of transducer states introduced by a split. We give results on a large spoken-query task for various n-phone orders and other phonetic features that show this method can greatly reduce the size of the resulting context-dependency transducer with no significant impact on recognition accuracy. This permits using context sizes and features that might otherwise be unmanageable. View details
    Preview abstract This paper describes a weighted finite-state transducer composition algorithm that generalizes the notion of the composition filter and present filters that remove useless epsilon paths and push forward labels and weights along epsilon paths. This filtering allows us to compose together large speech recognition context-dependent lexicons and language models much more efficiently in time and space than previously possible. We present experiments on Broadcast News and Google Search by Voice that demonstrate a 5% to 10% overhead for dynamic, runtime composition compared to a static, offline composition of the recognition transducer. To our knowledge, this is the first such system with such small overhead. View details
    Web Derived Pronunciations for Spoken Term Detection
    Doğan Can
    Erica Cooper
    Arnab Ghoshal
    Martin Jansche
    Sanjeev Khudanpur
    Bhuvana Ramabhadran
    Murat Saraçlar
    Abhinav Sethy
    Morgan Ulinski
    Christopher White
    32nd Annual International ACM SIGIR Conference (2009), pp. 83-90
    Preview abstract Indexing and retrieval of speech content in various forms such as broadcast news, customer care data and on-line media has gained a lot of interest for a wide range of applications, from customer analytics to on-line media search. For most retrieval applications, the speech content is typically first converted to a lexical or phonetic representation using automatic speech recognition (ASR). The first step in searching through indexes built on these representations is the generation of pronunciations for named entities and foreign language query terms. This paper summarizes the results of the work conducted during the 2008 JHU Summer Workshop by the Multilingual Spoken Term Detection team, on mining the web for pronunciations and analyzing their impact on spoken term detection. We will first present methods to use the vast amount of pronunciation information available on the Web, in the form of IPA and ad-hoc transcriptions. We describe techniques for extracting candidate pronunciations from Web pages and associating them with orthographic words, filtering out poorly extracted pronunciations, normalizing IPA pronunciations to better conform to a common transcription standard, and generating phonemic representations from ad-hoc transcriptions. We then present an analysis of the effectiveness of using these pronunciations to represent Out-Of-Vocabulary (OOV) query terms on the performance of a spoken term detection (STD) system. We will provide comparisons of Web pronunciations against automated techniques for pronunciation generation as well as pronunciations generated by human experts. Our results cover a range of speech indexes based on lattices, confusion networks and one-best transcriptions at both word and word fragments levels. View details
    Web-derived Pronunciations
    Arnab Ghoshal
    Martin Jansche
    Sanjeev Khudanpur
    Morgan Ulinski
    IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP) (2009), pp. 4289-4292
    Preview abstract Pronunciation information is available in large quantities on the Web, in the form of IPA and ad-hoc transcriptions. We describe techniques for extracting candidate pronunciations from Web pages and associating them with orthographic words, filtering out poorly extracted pronunciations, normalizing IPA pronunciations to better conform to a common transcription standard, and generating phonemic from ad-hoc transcriptions. We show improvements on a letter-to-phoneme task when using web-derived vs. Pronlex pronunciations. View details
    OpenFst: An Open-Source, Weighted Finite-State Transducer Library and its Applications to Speech and Language
    Martin Jansche
    Proceedings of the North American Chapter of the Association for Computational Linguistics -- Human Language Technologies (NAACL HLT) 2009 conference, Tutorials
    Preview abstract Finite-state methods are well established in language and speech processing. OpenFst (available from www.openfst.org) is a free and open-source software library for building and using finite automata, in particular, weighted finite-state transducers (FSTs). This tutorial is an introduction to weighted finitestate transducers and their uses in speech and language processing. While there are other weighted finite-state transducer libraries, OpenFst (a) offers, we believe, the most comprehensive, general and efficient set of operations; (b) makes available full source code; (c) exposes high- and low-level C++ APIs that make it easy to embed and extend; and (d) is a platform for active research and use among many colleagues. View details
    On the Computation of the Relative Entropy of Probabilistic Automata
    Ashish Rastogi
    International Journal of Foundations of Computer Science, vol. 19 (2008), pp. 219-242
    Preview
    Sample Selection Bias Correction Theory
    Proceedings of The 19th International Conference on Algorithmic Learning Theory (ALT 2008), Springer, Heidelberg, Germany, Budapest, Hungary
    Preview
    Speech Recognition with Weighted Finite-State Transducers
    Handbook on Speech Processing and Speech Communication, Part E: Speech recognition, Springer-Verlag, Heidelberg, Germany (2008)
    Preview
    Speech Recognition with Weighted Finite-State Transducers
    Handbook on Speech Processing and Speech Communication, Part E: Speech recognition, Springer-Verlag, Heidelberg, Germany (2007)
    Preview
    OpenFst: a General and Efficient Weighted Finite-State Transducer Library
    Johan Schalkwyk
    Wojciech Skut
    Proceedings of the 12th International Conference on Implementation and Application of Automata (CIAA 2007), Springer-Verlag, Heidelberg, Germany, Prague, Czech Republic
    Preview
    On the Computation of the Relative Entropy of Probabilistic Automata
    Ashish Rastogi
    International Journal of Foundations of Computer Science, vol. to appear (2007)
    Preview
    Efficient Computation of the Relative Entropy of Probabilistic Automata
    Ashish Rastogi
    Proceedings of the 7th Latin American Symposium (LATIN 2006), Springer-Verlag, Heidelberg, Germany, Valdivia, Chile
    Preview
    Automata and Graph Compression
    Ananda Theertha Suresh
    CoRR, vol. abs/1502.07288 (2015)
    Automata and graph compression
    Ananda Theertha Suresh
    ISIT (2015), pp. 2989-2993
    Efficient Computation of the Relative Entropy of Probabilistic Automata
    Ashish Rastogi
    LATIN (2006), pp. 323-336
    MAP adaptation of stochastic grammars
    Computer Speech and Language, vol. 20 (2006), pp. 41-68
    Weighted Automata in Text and Speech Processing
    arXiv, vol. abs/cs/0503077 (2005)
    A Generalized Construction of Integrated Speech Recognition Transducers
    Proceedings of the International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2004), Montreal, Canada
    Statistical Modeling for Unit Selection in Speech Synthesis
    $42$nd Meeting of the Association for Computational Linguistics (ACL 2004), Proceedings of the Conference, Barcelona, Spain
    Statistical Modeling for Unit Selection in Speech Synthesis
    42nd Meeting of the Association for Computational Linguistics (ACL 2004), Proceedings of the Conference, Barcelona, Spain
    Voice Signatures
    Proceedings of The 8th IEEE Automatic Speech Recognition and Understanding Workshop (ASRU 2003), St. Thomas, U.S. Virgin Islands
    Weighted finite-state transducers in speech recognition
    Computer Speech & Language, vol. 16 (2002), pp. 69-88
    A comparison of two LVR search optimization techniques
    Stephan Kanthak
    Hermann Ney
    INTERSPEECH (2002)
    An Efficient Algorithm for the N-Best-Strings Problem
    Proceedings of the International Conference on Spoken Language Processing 2002 (ICSLP '02), Denver, Colorado
    Weighted Finite-State Transducers in Speech Recognition (Tutorial)
    Proceedings of the International Conference on Spoken Language Processing 2002 (ICSLP '02), Denver, Colorado
    A Comparison of Two LVR Search Optimization Techniques
    Stephan Kanthak
    Hermann Ney
    Proceedings of the International Conference on Spoken Language Processing 2002 (ICSLP '02), Denver, Colorado
    An Efficient Algorithm for the $N$-Best-Strings Problem
    Proceedings of the International Conference on Spoken Language Processing 2002 (ICSLP '02), Denver, Colorado
    Weighted Finite-State Transducers in Speech Recognition
    Computer Speech and Language, vol. 16 (2002), pp. 69-88
    A weight pushing algorithm for large vocabulary speech recognition
    INTERSPEECH (2001), pp. 1603-1606
    A Weight Pushing Algorithm for Large Vocabulary Speech Recognition
    Proceedings of the 7th European Conference on Speech Communication and Technology (Eurospeech '01), Aalborg, Denmark (2001)
    The Design Principles of a Weighted Finite-State Transducer Library
    Theoretical Computer Science, vol. 231 (2000), pp. 17-32
    Weighted Finite-State Transducers in Speech Recognition
    Proceedings of the ISCA Tutorial and Research Workshop, Automatic Speech Recognition: Challenges for the new Millenium (ASR2000), Paris, France
    The Design Principles of a Weighted Finite-State Transducer Library
    Fernando C. N. Pereira
    Theor. Comput. Sci., vol. 231 (2000), pp. 17-32
    Network Optimizations for Large Vocabulary Speech Recognition
    Speech Communication, vol. 28 (1999), pp. 1-12
    Rapid Unit Selection from a Large Speech Corpus for Concatenative Speech Synthesis
    Mark Beutnagel
    Proceedings of the 6th European Conference on Speech Communication and Technology (Eurospeech '99), Budapest, Hungary (1999)
    Integrated context-dependent networks in very large vocabulary speech recognition
    Efficient General Lattice Generation and Rescoring
    Andrej Ljolje
    EUROSPEECH 99 (1999), pp. 1251-1254
    Network optimizations for large-vocabulary speech recognition
    Speech Communication, vol. 28 (1999), pp. 1-12
    Rapid unit selection from a large speech corpus for concatenative speech synthesis
    Mark Beutnagel
    EUROSPEECH (1999)
    Integrated Context-Dependent Networks in Very Large Vocabulary Speech Recognition
    Proceedings of the 6th European Conference on Speech Communication and Technology (Eurospeech '99), Budapest, Hungary (1999)
    Full expansion of context-dependent networks in large vocabulary speech recognition
    Donald Hindle
    Andrej Ljolje
    Fernando C. N. Pereira
    ICASSP (1998), pp. 665-668
    A Rational Design for a Weighted Finite-State Transducer Library
    Proceedings of the Second International Workshop on Implementing Automata (WIA '97), Springer-Verlag, Berlin-NY (1998), pp. 144-158
    Full Expansion of Context-Dependent Networks in Large Vocabulary Speech Recognition
    Don Hindle
    Andrej Ljolje
    Proceedings of the International Conference on Acoustics, Speech, and Signal Processing (ICASSP '98), Seattle, Washington (1998)
    Weighted determinization and minimization for large vocabulary speech recognition
    Weighted Determinization and Minimization for Large Vocabulary Speech Recognition
    Proceedings of the 5th European Conference on Speech Communication and Technology (Eurospeech '97), Rhodes, Greece (1997)
    Transducer Composition for Context-Dependent Network Expansion
    Proceedings of the 5th European Conference on Speech Communication and Technology (Eurospeech '97), Rhodes, Greece (1997)
    A Rational Design for a Weighted Finite-State Transducer Library
    Proceedings of the Workshop on Implementing Automata (WIA '97), London, Ontario, Canada, University of Western Ontario, London, Ontario, Canada (1997)
    Transducer Composition for Context-Dependent Network Expansion
    EuroSpeech'97, European Speech Communication Association, Genova, Italy (1997), pp. 1427-1430
    Speech Recognition by Composition of Weighted Finite Automata
    Finite-State Language Processing, MIT Press, Cambridge, Massachusetts (1997), pp. 431-453
    A Rational Design for a Weighted Finite-State Transducer Library
    WIA'97: Proceedings of the Workshop on Implementing Automata, Springer-Verlag (1997)
    Weighted Automata in Text and Speech Processing
    Proceedings of the 12th biennial European Conference on Artificial Intelligence (ECAI-96), Workshop on Extended finite state models of language, John Wiley and Sons, Chichester, Budapest, Hungary (1996)
    Compilation of Weighted Finite-State Transducers from Decision Trees
    ACL (1996), pp. 215-222
    Algorithms for Speech Recognition and Language Processing
    CoRR, vol. cmp-lg/9608018 (1996)
    Rational Power Series in Text and Speech Processing
    Graduate course, University of Pennsylvania, Department of Computer Science, Philadelphia, PA (1996)
    Speech Recognition by Composition of Weighted Finite Automata
    Finite-State Transducers in Language and Speech Processing
    Tutorial at the 16th International Conference on Computational Linguistics (COLING-96), COLING, Copenhagen, Denmark (1996)
    Compilation of Weighted Finite-State Transducers from Decision Trees
    CoRR, vol. cmp-lg/9606018 (1996)
    The AT&T 60,000 Word Speech-to-Text System
    Andrej Ljolje
    Don Hindle
    Eurospeech'95: ESCA 4th European Conference on Speech Communication and Technology, Madrid, Spain (1995), pp. 207-210
    Weighted Rational Transductions and their Application to Human Language Processing
    Human Language Technology Workshop, Morgan Kaufmann, San Francisco, California (1994), pp. 262-267
    Efficient Grammar Processing for a Spoken Language Translation System
    David B. Roe
    Alejandro Macarrón
    Proceedings of ICASSP, IEEE, San Francisco, California (1992), pp. 213-216
    A spoken language translator for restricted-domain context-free languages
    David B. Roe
    Alejandro Macarrón
    Speech Communication, vol. 11 (1992), pp. 311-319
    Toward a Spoken Language Translator for Restricted-Domain Context-Free Languages
    David B. Roe
    Alejandro Macarrón
    EUROSPEECH 91 -- 2nd European Conference on Speech Communication and Technology, Genova, Italy (1991), pp. 1063-1066