Scene Memory Transformer for Embodied Agents in Long Time Horizon Tasks

Kuan Fang
Fei-Fei Li
Silvio Savarese
Alexander Toshev
CVPR 2019
Google Scholar


Many robotic applications require a policy to perform tasks over a long time horizon in large environments. In such applications, decision making at any step can depend on states observed far in the past. Hence, being able to properly memorize past observation is crucial. In this work we bring recent advances in neural language understanding~\cite{Vaswani2017AttentionIA} to robotics. We propose a novel memory-based policy, called Scene Memory Transformer (SMT). This model is generic, makes no assumptions about the concrete application, and can be efficiently trained with Reinforcement Learning over long episodes. On a range of challenging navigation tasks, SMT demonstrates superior performance to other established stateful models by a margin over long episodes. We show that the proposed model is robust to noise and can utilize long-term dependencies in its memory. Videos and supplementary can found at \url{}