Google Research

Visionary: Vision Architecture Discovery for Robot Learning

International Conference on Robotics and Automation (ICRA) (2021)


We propose a vision-based architecture search algorithm for learning of robot manipulation tasks, which discovers interactions between low dimension action inputs and high dimensional visual inputs. The architectures are automatically designed while training for the task itself and are capable of discovering novel ways of combining action and image feature inputs as well as features from previous stages of learning. The obtained new architectures demonstrated better task success rates, in some cases with large margin, compared to a recent high performing baseline. Our real robot experiments also uncovered architectures which improve grasping performance by 6%. This is the first approach to demonstrate a tailored architecture can be simultaneously modified and trained for a real-robot task.

Learn more about how we do research

We maintain a portfolio of research projects, providing individuals and teams the freedom to emphasize specific types of work