Jump to Content
Jasmine Hsu

Jasmine Hsu

Applying deep learning to robotics at Google Brain.

Research Areas

Authored Publications
Google Publications
Other Publications
Sort By
  • Title
  • Title, descending
  • Year
  • Year, descending
    Do As I Can, Not As I Say: Grounding Language in Robotic Affordances
    Alexander Herzog
    Alexander Toshkov Toshev
    Andy Zeng
    Anthony Brohan
    Brian Andrew Ichter
    Byron David
    Chelsea Finn
    Clayton Tan
    Diego Reyes
    Dmitry Kalashnikov
    Eric Victor Jang
    Fei Xia
    Jarek Liam Rettinghouse
    Jornell Lacanlale Quiambao
    Julian Ibarz
    Karol Hausman
    Kyle Alan Jeffrey
    Linda Luu
    Mengyuan Yan
    Michael Soogil Ahn
    Nicolas Sievers
    Noah Brown
    Omar Eduardo Escareno Cortes
    Peng Xu
    Peter Pastor Sampedro
    Rosario Jauregui Ruano
    Sally Augusta Jesmonth
    Sergey Levine
    Steve Xu
    Yao Lu
    Yevgen Chebotar
    Yuheng Kuang
    Conference on Robot Learning (CoRL) (2022)
    Preview abstract Large language models can encode a wealth of semantic knowledge about the world. Such knowledge could in principle be extremely useful to robots aiming to act upon high-level, temporally extended instructions expressed in natural language. However, a significant weakness of language models is that they lack contextual grounding, which makes it difficult to leverage them for decision making within a given real-world context. For example, asking a language model to describe how to clean a spill might result in a reasonable narrative, but it may not be applicable to a particular agent, such as a robot, that needs to perform this task in a particular environment. We propose to provide this grounding by means of pretrained behaviors, which are used to condition the model to propose natural language actions that are both feasible and contextually appropriate. The robot can act as the language model’s “hands and eyes,” while the language model supplies high-level semantic knowledge about the task. We show how low-level tasks can be combined with large language models so that the language model provides high-level knowledge about the procedures for performing complex and temporally extended instructions, while value functions associated with these tasks provide the grounding necessary to connect this knowledge to a particular physical environment. We evaluate our method on a number of real-world robotic tasks, where we show that this approach is capable of executing long-horizon, abstract, natural-language tasks on a mobile manipulator. The project's website and the video can be found at \url{say-can.github.io}. View details
    Long-Range Indoor Navigation with PRM-RL
    Anthony Francis
    Marek Fiser
    Tsang-Wei Lee
    IEEE Transactions on Robotics (T-RO) (2020), pp. 19
    Preview abstract Long-range indoor navigation requires guiding robots with noisy sensors and controls through cluttered environments along paths that span a variety of buildings. We achieve this with PRM-RL, a hierarchical robot navigation method in which reinforcement learning agents that map noisy sensors to robot controls learn to solve short-range obstacle avoidance tasks, and then sampling-based planners map where these agents can reliably navigate in simulation; these roadmaps and agents are then deployed on robots, guiding them along the shortest path where the agents are likely to succeed. Here we use Probabilistic Roadmaps (PRMs) as the sampling-based planner, and AutoRL as the reinforcement learning method in the indoor navigation context. We evaluate the method in simulation for kinematic differential drive and kinodynamic car-like robots in several environments, and on differential-drive robots at three physical sites. Our results show PRM-RL with AutoRL is more successful than several baselines, is robust to noise, and can guide robots over hundreds of meters in the face of noise and obstacles in both simulation and on robots, including over 5.8 kilometers of physical robot navigation. View details
    Preview abstract Interest in derivative-free optimization (DFO) and "evolutionary strategies" (ES) has recently surged in the Reinforcement Learning (RL) community, with growing evidence that they can match state of the art methods for policy optimization problems in Robotics. However, it is well known that DFO methods suffer from prohibitively high sampling complexity. They can also be very sensitive to noisy rewards and stochastic dynamics. In this paper, we propose a new class of algorithms, called Robust Blackbox Optimization (RBO). Remarkably, even if up to 23% of all the measurements are arbitrarily corrupted, RBO can provably recover gradients to high accuracy. RBO relies on learning gradient flows using robust regression methods to enable off-policy updates. On several MuJoCo robot control tasks, when all other RL approaches collapse in the presence of adversarial noise, RBO is able to train policies effectively. We also show that RBO can be applied to legged locomotion tasks including path tracking for quadruped robots. View details
    Preview abstract This paper addresses two challenges facing sampling-based kinodynamic motion planning: a way to identify good candidate states for local transitions and the subsequent computationally intractable steering between these candidate states. Through the combination of sampling-based planning, a Rapidly Exploring Randomized Tree (RRT) and an efficient kinodynamic motion planner through machine learning, we propose an efficient solution to long-range planning for kinodynamic motion planning. First, we use deep reinforcement learning to learn an obstacle-avoiding policy that maps a robot's sensor observations to actions, which is used as a local planner during planning and as a controller during execution. Second, we train a reachability estimator in a supervised manner, which predicts the RL policy's time to reach a state in the presence of obstacles. Lastly, we introduce RL-RRT that uses the RL policy as a local planner, and the reachability estimator as the distance function to bias tree-growth towards promising regions. We evaluate our method on three kinodynamic systems, including physical robot experiments. Results across all three robots tested indicate that RL-RRT outperforms state of the art kinodynamic planners in efficiency, and also provides a shorter path finish time than a steering function free method. The learned local planner policy and accompanying reachability estimator demonstrate transferability to the previously unseen experimental environments, making RL-RRT fast because the expensive computations are replaced with simple neural network inference. Video: https://youtu.be/dDMVMTOI8KY View details
    Preview abstract This paper focuses on the problem of learning 6-DOF grasping with a parallel jaw gripper in simulation. Compared to existing approaches that are specialized in three-dimensional grasping (i.e., top-down grasping or side-grasping), using a 6-DOF grasping model allows the robot to learn a richer set of grasping interactions given less physical constraints; hence, potentially enhancing the robustness of grasping and robot dexterity. However, learning 6-DOF grasping is challenging due to a high dimensional state space, difficulty in collecting large-scale data, and many variations of an object’s visual appearance (i.e., geometry, material, texture, and illumination). We propose the notion of a geometry-aware representation in grasping based on the assumption that knowledge of 3D geometry is at the heart of interaction. Our key idea is constraining and regularizing grasping interaction learning through 3D geometry prediction. View details
    Time-Contrastive Networks: Self-Supervised Learning from Video
    Corey Lynch
    Yevgen Chebotar
    Eric Jang
    Stefan Schaal
    Sergey Levine
    Proceedings of International Conference in Robotics and Automation (ICRA 2018) + Deep Learning for Robotic Vision (DLRV) Workshop at CVPR 2017 + Deep Reinforcement Learning Symposium at NIPS 2017 (2018)
    Preview abstract We propose a self-supervised approach for learning representations and robotic behaviors entirely from unlabeled videos recorded from multiple viewpoints, and study how this representation can be used in two robotic imitation settings: imitating object interactions from videos of humans, and imitating human poses. Imitation of human behavior requires a viewpoint-invariant representation that captures the relationships between end-effectors (hands or robot grippers) and the environment, object attributes, and body pose. We train our representations using a triplet loss, where multiple simultaneous viewpoints of the same observation are attracted in the embedding space, while being repelled from temporal neighbors which are often visually similar but functionally different. This signal causes our model to discover attributes that do not change across viewpoint, but do change across time, while ignoring nuisance variables such as occlusions, motion blur, lighting and background. We demonstrate that this representation can be used by a robot to directly mimic human poses without an explicit correspondence, and that it can be used as a reward function within a reinforcement learning algorithm. While representations are learned from an unlabeled collection of task-related videos, robot behaviors such as pouring are learned by watching a single 3rd-person demonstration by a human. Reward functions obtained by following the human demonstrations under the learned representation enable efficient reinforcement learning that is practical for real-world robotic systems. Video results, open-source code and dataset are available at https://sermanet.github.io/imitate View details
    No Results Found