HYBRID LSTM-FSMN NETWORKS FOR ACOUSTIC MODELING

Asa Oines
Pedro Moreno
(2018)
Google Scholar

Abstract

This paper describes a series of experiments with neural networks containing long short-term memory (LSTM) [1] and feedforward sequential memory network (FSMN) [2, 3, 4] layers trained with the connectionist temporal classification (CTC) [5] criteria for acoustic modeling. We propose using a hybrid LSTM/FSMN (FLMN) architecture as an enhancement to conventional LSTM-only acoustic models. The addition of FSMN layers allows the network to model a fixed size representation of future context suitable for online speech recognition. Our experiments show that FLMN acoustic models significantly outperform conventional LSTM. We also compare the FLMN architecture with other methods of modeling future context. Finally, we present a modification of the FSMN architecture that improves performance by reducing the width of the FSMN output.

Research Areas