Google Research

Learning Latent Plans from Play


We propose a self-supervised approach to learning a wide variety of manipulation skills from unlabeled data collected through playing in and interacting within a playground environment. Learning by playing offers three main advantages: 1) Collecting large amounts of play data is cheap and fast as it does not require staging the scene nor labeling data, 2) It relaxes the need to have a discrete and rigid definition of skills/tasks during the data collection. This allows the agent to focus on acquiring a continuum set of manipulation skills as a whole, which can then be conditioned to perform a particular skill such as grasping. Furthermore, this data already includes ways to recover, retry or transition between different skills, which can be used to achieve a reactive closed-loop control policy, 3) It allows to quickly learn a new skill from making use of pre-existing general abilities. Our proposed approach to learning new skills from unlabeled play data decouples high-level planning prediction from low-level action prediction by: first self-supervise learning of a latent planning space, then self-supervise learning of an action model that is conditioned on a latent plan. This results in a single task-agnostic policy conditioned on a user-provided goal. This policy can perform a variety of tasks in the environment where playing was observed. We train a single model on 3 hours of unlabeled play data and evaluate it on 18 tasks simply by feeding a goal state corresponding to each task. The baseline model reaches an accuracy of 65\% using 18 specialized policies in 100-shot per task and trained on 1800 expensive demonstrations. Our model completes the tasks with an average of 85\% accuracy using a single policy in zero shots (having never been explicitly trained on these tasks) using cheap unlabeled data. Videos of the performed experiments are available at

Learn more about how we do research

We maintain a portfolio of research projects, providing individuals and teams the freedom to emphasize specific types of work