Google Research




This paper describes a series of experiments with neural networks containing long short-term memory (LSTM) [1] and feedforward sequential memory network (FSMN) [2, 3, 4] layers trained with the connectionist temporal classification (CTC) [5] criteria for acoustic modeling. We propose using a hybrid LSTM/FSMN (FLMN) architecture as an enhancement to conventional LSTM-only acoustic models. The addition of FSMN layers allows the network to model a fixed size representation of future context suitable for online speech recognition. Our experiments show that FLMN acoustic models significantly outperform conventional LSTM. We also compare the FLMN architecture with other methods of modeling future context. Finally, we present a modification of the FSMN architecture that improves performance by reducing the width of the FSMN output.

Research Areas

Learn more about how we do research

We maintain a portfolio of research projects, providing individuals and teams the freedom to emphasize specific types of work