Fergus Henderson
Fergus Henderson has been a software engineer at Google since 2006. He started programming as a kid in 1979, and went on to academic research in programming language design and implementation. With his PhD supervisor, he co-founded a research group at the University of Melbourne that developed the programming language Mercury. He has been a program committee member for eight international conferences, and has released over 500,000 lines of open-source code. He was a former moderator of the Usenet newsgroup comp.std.c++ and was an officially accredited “Technical Expert” for the ISO C and C++ committees. He also has over 15 years of commercial software industry experience, starting with his first full-time industry job, as a COBOL programmer (for Australian company Frontier Software) at the age of 16. He spent 2.5 years working at Galois, Inc., in Portland, Oregon, where he developed a compiler from Cryptol (a domain-specific functional programming language for cryptography) to FPGA hardware. At Google, he was one of the original developers of Blaze, a build tool now used across Google, and worked on the server-side software behind speech recognition and voice actions (before Siri!) and speech synthesis. He currently manages Google's text-to-speech engineering team, but still writes and reviews plenty of code. Software that he has written is installed on over a billion devices.
Research Areas
Authored Publications
Sort By
Fast, Compact, and High Quality LSTM-RNN Based Statistical Parametric Speech Synthesizers for Mobile Devices
Yannis Agiomyrgiannakis
Niels Egberts
Przemysław Szczepaniak
Proc. Interspeech, San Francisco, CA, USA (2016), pp. 2273-2277
Preview abstract
Acoustic models based on long short-term memory recurrent neural network (LSTM-RNN) were applied to statistical parametric speech synthesis (SPSS) and showed significant improvements in naturalness and latency over those based on hidden Markov models (HMMs). This paper describes further optimizations of LSTM-RNN-based SPSS to deploy it to mobile devices; weight quantization, multi-frame inference, and robust inference using an ε-contaminated Gaussian loss function. Experimental results in subjective listening tests show that these optimizations can make LSTM-RNN-based SPSS comparable to HMM-based SPSS in runtime speed while maintaining naturalness. Evaluations between LSTM-RNN-based SPSS and HMM-driven unit selection speech synthesis are also presented.
View details