Eidetic 3D LSTM: A Model for Video Prediction and Beyond

Yunbo Wang; Lu Jiang; Ming-hsuan Yang; Jia Li; Mingsheng Long; Fei-Fei Li

Eidetic 3D LSTM: A Model for Video Prediction and Beyond

Yunbo Wang

Lu Jiang

Ming-hsuan Yang

Jia Li

Mingsheng Long

Fei-Fei Li

ICLR (2019)

Google Scholar

Abstract

Spatiotemporal predictive learning, though long considered to be a promising self-supervised feature learning method, seldom shows its effectiveness beyond future video prediction. The reason is that it is difficult to learn good representations for both short-term frame dependency and long-term high-level relations. We present a new model, Eidetic 3D LSTM (E3D-LSTM), that integrates 3D convolutions into RNNs. The encapsulated 3D-Conv makes local perceptrons of RNNs motion-aware and enables the memory cell to store better short-term features. For long-term relations, we make the present memory state interact with its historical records via a gate-controlled self-attention module. We describe this memory transition mechanism eidetic as it is able to effectively recall the stored memories across multiple time stamps even after long periods of disturbance. We first evaluate the E3D-LSTM network on widely-used future video prediction datasets and achieve the state-of-the-art performance. Then we show that the E3D-LSTM network also performs well on the early activity recognition to infer what is happening or what will happen after observing only limited frames of video. This task aligns well with video prediction in modeling action intentions and tendency.

Explore our many areas of focus

Building a collaborative ecosystem

Shaping the future together

Translating discovery into real-world impact

Eidetic 3D LSTM: A Model for Video Prediction and Beyond

Abstract

Research Areas

Meet the teams driving innovation

Google AI

Google Cloud

Google DeepMind

Google Labs