Jump to Content


Having a machine learning agent interact with its environment requires true unsupervised learning, skill acquisition, active learning, exploration and reinforcement, all ingredients of human learning that are still not well understood or exploited through the supervised approaches that dominate deep learning today. Our goal is to improve robotics via machine learning, and improve machine learning via robotics. We foster close collaborations between machine learning researchers and roboticists to enable learning at scale on real and simulated robotic systems.

Recent Publications

Preview abstract In recent years, much progress has been made in learning robotic manipulation policies that can follow natural language instructions. Common approaches involve learning methods that operate on offline datasets, such as task-specific teleoperated demonstrations or on hindsight labeled robotic experience. Such methods work reasonably but rely strongly on the assumption of clean data: teleoperated demonstrations are collected with specific tasks in mind, while hindsight language descriptions rely on expensive human labeling. Recently, large-scale pretrained language and vision-language models like CLIP have been applied to robotics in the form of learning representations and planners. However, can these pretrained models also be used to cheaply impart internet-scale knowledge onto offline datasets, providing access to skills contained in the offline dataset that weren't necessarily reflected in ground truth labels? We investigate fine-tuning a reward model on a small dataset of robot interactions with crowd-sourced natural language labels and using the model to relabel instructions of a large offline robot dataset. The resulting dataset with diverse language skills is used to train imitation learning policies, which outperform prior methods by up to 30% when evaluated on a diverse set of novel language instructions that were not contained in the original dataset. View details
CLARA: Classifying and Disambiguating User Commands for Reliable Interactive Robotic Agents
Jeongeun Park
Seungwon Lim
Joonhyung Lee
Sangbeom Park
Sungjoon Choi
Youngjae Yu
IEEE Robotics and Automation Letters (2023) (to appear)
Preview abstract In this paper, we focus on inferring whether the given user command is clear, ambiguous, or infeasible in the context of interactive robotic agents utilizing large language models (LLMs). To tackle this problem, we first present an uncertainty estimation method for LLMs to classify whether the command is certain (i.e., clear) or not (i.e., ambiguous or infeasible). Once the command is classified as uncertain, we further distinguish it between ambiguous or infeasible commands leveraging LLMs with situational aware few-shot prompting in a zero-shot manner. For ambiguous commands, we further disambiguate the command by interacting with users via question generation with LLMs. We believe that proper recognition of the given commands could lead to a decrease in malfunction and undesired actions of the robot, enhancing the reliability of interactive robot agents. To evaluate the proposed system, we present a dataset consisting pair of high-level commands, scene descriptions, and labels of command type (i.e., clear, ambiguous, or infeasible). We validate the proposed method on the collected dataset, pick-and-place tabletop simulation. Furthermore, we demonstrate the approach in a real-world human-robot interaction environment, i.e., handover scenarios. View details
A Connection between Actor Regularization and Critic Regularization in Reinforcement Learning
Benjamin Eysenbach
Matthieu Geist
Ruslan Salakhutdinov
Sergey Levine
International Conference on Machine Learning (ICML) (2023)
Preview abstract As with any machine learning problem with limited data, effective offline RL algorithms require careful regularization to avoid overfitting, with most methods regularizing either the actor or the critic. These methods appear distinct. Actor regularization (e.g., behavioral cloning penalties) is simpler and has appealing convergence properties, while critic regularization typically requires significantly more compute because it involves solving a game, but it has appealing lower-bound guarantees. Empirically, prior work alternates between claiming better results with actor regularization and critic regularization. In this paper, we show that these two regularization techniques can be equivalent under some assumptions: regularizing the critic using a CQL-like objective is equivalent to updating the actor with a BC- like regularizer and with a SARSA Q-value (i.e., “1-step RL”). Our experiments show that this theoretical model makes accurate, testable predictions about the performance of CQL and one-step RL. While our results do not definitively say whether users should prefer actor regularization or critic regularization, our results hint that actor regularization methods may be a simpler way to achieve the desirable properties of critic regularization. The results also suggest that the empirically- demonstrated benefits of both types of regularization might be more a function of implementation details rather than objective superiority. View details
Single-Level Differentiable Contact Simulation
Simon Le Cleac'h
Mac Schwager
Zachary Manchester
Pete Florence
Sumeet Singh
IEEE RAL (2023)
Preview abstract We present a differentiable formulation of rigid-body contact dynamics for objects and robots represented as compositions of convex primitives. Existing optimization-based approaches simulating contact between convex primitives rely on a bilevel formulation that separates collision detection and contact simulation. These approaches are unreliable in realistic contact simulation scenarios because isolating the collision detection problem introduces contact location non-uniqueness. Our approach combines contact simulation and collision detection into a unified single-level optimization problem. This disambiguates the collision detection problem in a physics-informed manner. Compared to previous differentiable simulation approaches, our formulation features improved simulation robustness and computational complexity improved by more than an order of magnitude. We provide a numerically efficient implementation of our formulation in the Julia language called \href{https://github.com/simon-lc/DojoLight.jl}{DojoLight.jl}. View details
Bootstrap Your Own Skills: Learning to Solve New Tasks with Large Language Model Guidance
Jesse Zhang
Jiahui Zhang
Karl Pertsch
Ziyi Liu
Xiang Ren
Shao-Hua Sun
Joseph Lim
Conference on Robot Learning 2023 (2023)
Preview abstract We propose BOSS, an approach that automatically learns to solve new long-horizon, complex, and meaningful tasks by autonomously growing a learned skill library. Prior work in reinforcement learning require expert supervision, in the form of demonstrations or rich reward functions, to learn long-horizon tasks. Instead, our approach BOSS (BOotStrapping your own Skills) learns to accomplish new tasks by performing “skill bootstrapping,” where an agent with a set of primitive skills interacts with the environment to practice new skills without receiving reward feedback for tasks outside of the initial skill set. This bootstrapping phase is guided by large language models (LLMs) that inform the agent of meaningful skills to chain together. Through this process, BOSS builds a wide range of complex and useful behaviors from a basic set of primitive skills. We demonstrate through experiments in realistic household environments that agents trained with our LLM-guided bootstrapping procedure outperform those trained with naive bootstrapping as well as prior unsupervised skill acquisition methods on zero-shot execution of unseen, long-horizon tasks in new environments View details
Agile Catching with Whole-Body MPC and Blackbox Policy Learning
Saminda Abeyruwan
Nick Boffi
Anish Shankar
Sumeet Singh
Jean-Jacques Slotine
Stephen Tu
Learning for Dynamics and Control (2023)
Preview abstract We address a benchmark task in agile robotics: catching objects thrown at high-speed. This is a challenging task that involves tracking, intercepting, and cradling a thrown object with access only to visual observations of the object and the proprioceptive state of the robot, all within a fraction of a second. We present the relative merits of two fundamentally different solution strategies: (i) Model Predictive Control using accelerated constrained trajectory optimization, and (ii) Reinforcement Learning using zeroth-order optimization. We provide insights into various performance tradeoffs including sample efficiency, sim-to-real transfer, robustness to distribution shifts, and wholebody multimodality via extensive on-hardware experiments. We conclude with proposals on fusing “classical” and “learning-based” techniques for agile robot control. Videos of our experiments may be found here: https://sites.google.com/view/agile-catching. View details

Some of our teams