Jump to Content

Visual Representations for Semantic Target Driven Navigation

Arsalan Mousavian
Marek Fiser
James Davidson
Jana Kosecka
Alexander Toshev
International Conference on Robotics and Automation (2019)
Google Scholar


One of the fundamental challenges of a robotic agent is to navigate in complex environments and find objects of interests, e.g.~go to the refrigerator. In this work we address this challenge in the context of agents defined as Neural Networks and operating in the real world. Specifically, we address the question of a good visual representation which can capture not only spatial layout but also semantic contextual cues. We propose to use segmentation and detection masks obtained by off-the-shelf state-of-the-art vision algorithms. Such a representation allows for using additional relevant data for better training different parts of the model -- the representation extraction is trained on large standard vision datasets while the navigation component utilizes large synthetic environments. The latter is possible as such environments come with segmentation and detection masks and thus no domain adaptation is needed. The resulting navigation system utilizes larger and more power controllers compared to other learning based approaches. Further, it can be readily applied to real non-synthetic environments as demonstrated on the Active Vision Dataset~\cite{active-vision-dataset2017}.

Research Areas