Google Research

Scene Memory Transformer for Embodied Agents in Long Time Horizon Tasks

  • Kuan Fang
  • Fei-Fei Li
  • Silvio Savarese
  • Alexander Toshev
CVPR 2019


Many robotic applications require a policy to perform tasks over a long time horizon in large environments. In such applications, decision making at any step can depend on states observed far in the past. Hence, being able to properly memorize past observation is crucial. In this work we bring recent advances in neural language understanding~\cite{Vaswani2017AttentionIA} to robotics. We propose a novel memory-based policy, called Scene Memory Transformer (SMT). This model is generic, makes no assumptions about the concrete application, and can be efficiently trained with Reinforcement Learning over long episodes. On a range of challenging navigation tasks, SMT demonstrates superior performance to other established stateful models by a margin over long episodes. We show that the proposed model is robust to noise and can utilize long-term dependencies in its memory. Videos and supplementary can found at \url{}

Learn more about how we do research

We maintain a portfolio of research projects, providing individuals and teams the freedom to emphasize specific types of work