BC-Z: Zero-Shot Task Generalization with Robotic Imitation Learning

Chelsea Finn
Corey Harrison Lynch
Daniel Kappler
Eric Victor Jang
Frederik Ebert
Mohi Khansari
Sergey Levine
Conference on Robot Learning (2021)
Google Scholar

Abstract

In this paper, we study the problem of enabling a vision-based robotic manipulation system to generalize across diverse scenes and diverse tasks, a long-standing challenge in robot learning. We approach the above challenge from an imitation learning perspective, aiming to study how scaling and broadening the data collected can facilitate generalization to new scenes and tasks. To that end, we develop a shared-autonomy system for demonstrating correct behavior to the robot along with an imitation learning method that can flexibly condition on task embeddings computed from language or video. Using this system, we scale data collection to dozens of scenes and over 100 tasks, and investigate how various design choices translate to performance. We show that our system enables a real robot, using the same neural network architecture for learning policies, to pick objects from a bin at 4 objects a minute, open swing doors and latched doors it has never seen before (success rates of 94% and 27%), and perform at least dozens of unseen manipulation tasks with a success rate of 50%.