Jump to Content

Factorized Recurrent Neural Architectures for Long Range Dependence

Francois Belletti
Alex Beutel
Sagar Jain
AIStats 2018

Abstract

The ability to capture Long Range Dependence (LRD) in a stochastic process is of prime importance in the context of predictive models. A sequential model with a longer-term memory is able to better contextualize recent observations. In this article, we apply the theory of LRD stochastic processes to modern recurrent architectures such as LSTM and GRU and prove they do not provide LRD behavior under homoscedasticity assumptions. After having proven that leaky gating mechanisms lead to memory loss in gated recurrent networks such as LSTMs and GRUs we provide an architecture that attempts at addressing the issue of faulty memory. The key insight of our theoretical study is to encourage memory redundancy. We show how the resulting architectures are more lightweight, parallelizable and able to leverage old observations. Experimental results on a synthetic copy task, the Youtube-8m video classification task and a latency sensitive recommender system show that our approach leads to better memorization

Research Areas