Google Research

Visual Representations for Semantic Target Driven Navigation

  • Arsalan Mousavian
  • Marek Fiser
  • James Davidson
  • Jana Kosecka
  • Alexander Toshev
International Conference on Robotics and Automation (2019)


One of the fundamental challenges of a robotic agent is to navigate in complex environments and find objects of interests, e.g.~go to the refrigerator. In this work we address this challenge in the context of agents defined as Neural Networks and operating in the real world.
Specifically, we address the question of a good visual representation which can capture not only spatial layout but also semantic contextual cues. We propose to use segmentation and detection masks obtained by off-the-shelf state-of-the-art vision algorithms. Such a representation allows for using additional relevant data for better training different parts of the model -- the representation extraction is trained on large standard vision datasets while the navigation component utilizes large synthetic environments. The latter is possible as such environments come with segmentation and detection masks and thus no domain adaptation is needed. The resulting navigation system utilizes larger and more power controllers compared to other learning based approaches. Further, it can be readily applied to real non-synthetic environments as demonstrated on the Active Vision Dataset~\cite{active-vision-dataset2017}.

Research Areas

Learn more about how we do research

We maintain a portfolio of research projects, providing individuals and teams the freedom to emphasize specific types of work