Jump to Content


Having a machine learning agent interact with its environment requires true unsupervised learning, skill acquisition, active learning, exploration and reinforcement, all ingredients of human learning that are still not well understood or exploited through the supervised approaches that dominate deep learning today. Our goal is to improve robotics via machine learning, and improve machine learning via robotics. We foster close collaborations between machine learning researchers and roboticists to enable learning at scale on real and simulated robotic systems.

Recent Publications

Preview abstract In recent years, much progress has been made in learning robotic manipulation policies that can follow natural language instructions. Common approaches involve learning methods that operate on offline datasets, such as task-specific teleoperated demonstrations or on hindsight labeled robotic experience. Such methods work reasonably but rely strongly on the assumption of clean data: teleoperated demonstrations are collected with specific tasks in mind, while hindsight language descriptions rely on expensive human labeling. Recently, large-scale pretrained language and vision-language models like CLIP have been applied to robotics in the form of learning representations and planners. However, can these pretrained models also be used to cheaply impart internet-scale knowledge onto offline datasets, providing access to skills contained in the offline dataset that weren't necessarily reflected in ground truth labels? We investigate fine-tuning a reward model on a small dataset of robot interactions with crowd-sourced natural language labels and using the model to relabel instructions of a large offline robot dataset. The resulting dataset with diverse language skills is used to train imitation learning policies, which outperform prior methods by up to 30% when evaluated on a diverse set of novel language instructions that were not contained in the original dataset. View details
Preview abstract We address a benchmark task in agile robotics: catching objects thrown at high-speed. This is a challenging task that involves tracking, intercepting, and cradling a thrown object with access only to visual observations of the object and the proprioceptive state of the robot, all within a fraction of a second. We present the relative merits of two fundamentally different solution strategies: (i) Model Predictive Control using accelerated constrained trajectory optimization, and (ii) Reinforcement Learning using zeroth-order optimization. We provide insights into various performance tradeoffs including sample efficiency, sim-to-real transfer, robustness to distribution shifts, and wholebody multimodality via extensive on-hardware experiments. We conclude with proposals on fusing “classical” and “learning-based” techniques for agile robot control. Videos of our experiments may be found here: https://sites.google.com/view/agile-catching. View details
Scalable Multi-Sensor Robot Imitation Learning via Task-Level Domain Consistency
Armando Fuentes
Eric Victor Jang
Matt Bennice
Mohi Khansari
Nicolas Sievers
Yuqing Du
ICRA (2023) (to appear)
Preview abstract Recent work in visual end-to-end learning for robotics has shown the promise of imitation learning across a variety of tasks. However, such approaches are often expensive and require vast amounts of real world training demonstrations. Additionally, they rely on a time-consuming evaluation process for identifying the best model to deploy in the real world. These challenges can be mitigated by simulation - by supplementing real world data with simulated demonstrations and using simulated evaluations to identify strong policies. However, this introduces the well-known ``reality gap'' problem, where simulator inaccuracies decorrelates performance in simulation from reality. In this paper, we build on top of prior work in GAN-based domain adaptation and introduce the notion of a Task Consistency Loss (TCL), a self-supervised contrastive loss that encourages sim and real alignment both at the feature and action-prediction level. We demonstrate the effectiveness of our approach on the challenging task of latched-door opening with a 9 Degree-of-Freedom (DoF) mobile manipulator from raw RGB and depth images. While most prior work in vision-based manipulation operate from a fixed, third person view, mobile manipulation couples the challenges of locomotion and manipulation with greater visual diversity and action space complexity. We find that we are able to achieve 77% success on seen and unseen scenes, a +30% increase from the baseline, using only ~16 hours of teleoperation demonstrations in sim and real. View details
Preview abstract We present a differentiable formulation of rigid-body contact dynamics for objects and robots represented as compositions of convex primitives. Existing optimization-based approaches simulating contact between convex primitives rely on a bilevel formulation that separates collision detection and contact simulation. These approaches are unreliable in realistic contact simulation scenarios because isolating the collision detection problem introduces contact location non-uniqueness. Our approach combines contact simulation and collision detection into a unified single-level optimization problem. This disambiguates the collision detection problem in a physics-informed manner. Compared to previous differentiable simulation approaches, our formulation features improved simulation robustness and computational complexity improved by more than an order of magnitude. We provide a numerically efficient implementation of our formulation in the Julia language called \href{https://github.com/simon-lc/DojoLight.jl}{DojoLight.jl}. View details
CLARA: Classifying and Disambiguating User Commands for Reliable Interactive Robotic Agents
Jeongeun Park
Seungwon Lim
Joonhyung Lee
Sangbeom Park
Sungjoon Choi
Youngjae Yu
IEEE Robotics and Automation Letters (2023) (to appear)
Preview abstract In this paper, we focus on inferring whether the given user command is clear, ambiguous, or infeasible in the context of interactive robotic agents utilizing large language models (LLMs). To tackle this problem, we first present an uncertainty estimation method for LLMs to classify whether the command is certain (i.e., clear) or not (i.e., ambiguous or infeasible). Once the command is classified as uncertain, we further distinguish it between ambiguous or infeasible commands leveraging LLMs with situational aware few-shot prompting in a zero-shot manner. For ambiguous commands, we further disambiguate the command by interacting with users via question generation with LLMs. We believe that proper recognition of the given commands could lead to a decrease in malfunction and undesired actions of the robot, enhancing the reliability of interactive robot agents. To evaluate the proposed system, we present a dataset consisting pair of high-level commands, scene descriptions, and labels of command type (i.e., clear, ambiguous, or infeasible). We validate the proposed method on the collected dataset, pick-and-place tabletop simulation. Furthermore, we demonstrate the approach in a real-world human-robot interaction environment, i.e., handover scenarios. View details
Mechanical Search on Shelves with Efficient Stacking and Destacking of Objects
Huang Huang
Letian Fu
Michael Danielczuk
Chung Min Kim
Zachary Tam
Jeff Ichnowski
Brian Ichter
Ken Goldberg
The International Symposium of Robotics Research (ISRR) (2023)
Preview abstract Stacking increases storage efficiency in shelves, but the lack of visibility and accessibility makes the mechanical search problem of revealing and extracting target objects difficult for robots. In this paper, we extend the lateral-access mechanical search problem to shelves with stacked items and introduce two novel policies -- Distribution Area Reduction for Stacked Scenes (DARSS) and Monte Carlo Tree Search for Stacked Scenes (MCTSSS) -- that use destacking and restacking actions. MCTSSS improves on prior lookahead policies by considering future states after each potential action. Experiments in 1200 simulated and 18 physical trials with a Fetch robot equipped with a blade and suction cup suggest that destacking and restacking actions can reveal the target object with 82--100% success in simulation and 66--100% in physical experiments, and are critical for searching densely packed shelves. In the simulation experiments, both policies outperform a baseline and achieve similar success rates but take more steps compared with an oracle policy that has full state information. In simulation and physical experiments, DARSS outperforms MCTSSS in median number of steps to reveal the target, but MCTSSS has a higher success rate in physical experiments, suggesting robustness to perception noise. View details

Some of our teams