Jump to Content

Alex Irpan

Authored Publications
Google Publications
Other Publications
Sort By
  • Title
  • Title, desc
  • Year
  • Year, desc
    Do As I Can, Not As I Say: Grounding Language in Robotic Affordances
    Alexander Herzog
    Alexander Toshkov Toshev
    Anthony Brohan
    Byron David
    Clayton Tan
    Diego Reyes
    Dmitry Kalashnikov
    Eric Victor Jang
    Jarek Liam Rettinghouse
    Jornell Lacanlale Quiambao
    Julian Ibarz
    Kyle Alan Jeffrey
    Linda Luu
    Mengyuan Yan
    Michael Soogil Ahn
    Nicolas Sievers
    Noah Brown
    Omar Eduardo Escareno Cortes
    Peng Xu
    Peter Pastor Sampedro
    Rosario Jauregui Ruano
    Sally Augusta Jesmonth
    Steve Xu
    Yao Lu
    Yevgen Chebotar
    Yuheng Kuang
    Conference on Robot Learning 2022 (2022)
    Preview abstract Large language models can encode a wealth of semantic knowledge about the world. Such knowledge could in principle be extremely useful to robots aiming to act upon high-level, temporally extended instructions expressed in natural language. However, a significant weakness of language models is that they lack contextual grounding, which makes it difficult to leverage them for decision making within a given real-world context. For example, asking a language model to describe how to clean a spill might result in a reasonable narrative, but it may not be applicable to a particular agent, such as a robot, that needs to perform this task in a particular environment. We propose to provide this grounding by means of pretrained behaviors, which are used to condition the model to propose natural language actions that are both feasible and contextually appropriate. The robot can act as the language model’s “hands and eyes,” while the language model supplies high-level semantic knowledge about the task. We show how low-level tasks can be combined with large language models so that the language model provides high-level knowledge about the procedures for performing complex and temporally extended instructions, while value functions associated with these tasks provide the grounding necessary to connect this knowledge to a particular physical environment. We evaluate our method on a number of real-world robotic tasks, where we show that this approach is capable of executing long-horizon, abstract, natural-language tasks on a mobile manipulator. The project's website and the video can be found at \url{say-can.github.io}. View details
    Actionable Models: Unsupervised Offline Learning of Robotic Skills
    Benjamin Eysenbach
    Dmitry Kalashnikov
    Jake Varley
    Yao Lu
    Yevgen Chebotar
    International Conference on Machine Learning 2021 (2021)
    Preview abstract We consider the problem of learning useful robotic skills from previously collected offline data without access to manually specified rewards or additional online exploration, a setting that is becoming increasingly important for scaling robot learning by reusing past robotic data. In particular, we propose the objective of learning a functional understanding of the environment by learning to reach any goal state in a given dataset. We employ goal-conditioned Q-learning with hindsight relabeling and develop several techniques that enable training in a particularly challenging offline setting. We find that our method can operate on high-dimensional camera images and learn a variety of skills on real robots that generalize to previously unseen scenes and objects. We also show that our method can learn to reach long-horizon goals across multiple episodes, and learn rich representations that can help with downstream tasks through pre-training or auxiliary objectives. View details
    AW-Opt: Learning Robotic Skills with Imitationand Reinforcement at Scale
    Yao Lu
    Yevgen Chebotar
    Mengyuan Yan
    Eric Victor Jang
    Alexander Herzog
    Mohi Khansari
    Dmitry Kalashnikov
    Conference on Robot Learning 2021 (2021)
    Preview abstract This paper proposes a new algorithm "AW-Opt" to combine Imitation Learning (IL) and Reinforcement Learning (RL). Prior methods face significant difficulty with sparse reward, image based input robotics tasks. By carefully designing sample filtering strategy, exploration strategy, and bellman equation, AW-Opt outperforms existing SOTA algorithms. Experimental results in both simulation and with real robots show that AW-Opt can achieve reasonable success rate from initial demonstrations, maintain low inference time, fine tune to reach SOTA success rate and use much less samples than existing algorithms. View details
    BC-Z: Zero-Shot Task Generalization with Robotic Imitation Learning
    Corey Harrison Lynch
    Daniel Kappler
    Eric Victor Jang
    Frederik Ebert
    Mohi Khansari
    Conference on Robot Learning (2021)
    Preview abstract In this paper, we study the problem of enabling a vision-based robotic manipulation system to generalize across diverse scenes and diverse tasks, a long-standing challenge in robot learning. We approach the above challenge from an imitation learning perspective, aiming to study how scaling and broadening the data collected can facilitate generalization to new scenes and tasks. To that end, we develop a shared-autonomy system for demonstrating correct behavior to the robot along with an imitation learning method that can flexibly condition on task embeddings computed from language or video. Using this system, we scale data collection to dozens of scenes and over 100 tasks, and investigate how various design choices translate to performance. We show that our system enables a real robot, using the same neural network architecture for learning policies, to pick objects from a bin at 4 objects a minute, open swing doors and latched doors it has never seen before (success rates of 94% and 27%), and perform at least dozens of unseen manipulation tasks with a success rate of 50%. View details
    Preview abstract Robots trained via reinforcement-learning (RL) requirecollecting and labeling many real-world episodes, whichmay be costly and time-consuming. Training models with alarge amount of simulation is a cheaper alternative. How-ever, simulations are not perfect and such models may nottransfer to the real world. Techniques developed to closethis simulation-to-reality (Sim2Real) gap typically applyrandomization to the simulated images or adapt them withan additional Sim2Real model. A Generative Adversar-ial network (GAN) may be used to adapt the pixels of thesimulated image to be more realistic before use by a deepRL model. We find the CycleGAN which enforces a cycleconsistency between Sim2Real and Real2Sim adaptationsproduces better images for RL than a GAN alone. Ulti-mately, we develop RL-CycleGAN which includes a Cycle-GAN which trains jointly with the deep RL model and en-forces that the RL model is consistent across all the adap-tations.We evaluate the RL-CycleGAN on two vision-based robotics grasping tasks and compare it to previoustechniques. With 580,000 real episodes and millions ofsimulated episodes adapted with RL-CycleGAN achievesxx% grasp success, while a previous GAN-based approach,GraspGAN, achieves xx% grasp success. With only 5,000real episodes, RL-CycleGAN and GraspGAN achieve xx%and xx% grasp success respectively. On a multi-bin grasp-ing task, we show RL-CycleGAN drastically improves dataefficiency requiring 1/xth the amount of real data to reachthe same grasping performance. View details
    Noise Contrastive Priors for Functional Uncertainty
    Danijar Hafner
    Timothy Lillicrap
    James Davidson
    UAI (2019)
    Preview abstract Obtaining reliable uncertainty estimates of neural network predictions is a long standing challenge. Bayesian neural networks have been proposed as a solution, but it remains open how to specify their prior. In particular, the common practice of an independent normal prior in weight space imposes relatively weak constraints on the function posterior, allowing it to generalize in unforeseen ways on inputs outside of the training distribution. We propose noise contrastive priors (NCPs) to obtain reliable uncertainty estimates. The key idea is to train the model to output high uncertainty for data points outside of the training distribution. NCPs do so using an input prior, which adds noise to the inputs of the current mini batch, and an output prior, which is a wide distribution given these inputs. NCPs are compatible with any model that can output uncertainty estimates, are easy to scale, and yield reliable uncertainty estimates throughout training. Empirically, we show that NCPs prevent overfitting outside of the training distribution and result in uncertainty estimates that are useful for active learning. We demonstrate the scalability of our method on the flight delays data set, where we significantly improve upon previously published results. View details
    Using Simulation and Domain Adaptation to Improve Efficiency of Deep Robotic Grasping
    Paul Wohlhart
    Matthew Kelcey
    Mrinal Kalakrishnan
    Laura Downs
    Julian Ibarz
    Peter Pastor Sampedro
    Kurt Konolige
    ICRA (2018)
    Preview abstract Instrumenting and collecting annotated visual grasping datasets to train modern machine learning algorithms is prohibitively expensive. An appealing alternative is to use off-the-shelf simulators to render synthetic data for which ground-truth annotations are generated automatically. Unfortunately, models trained purely on simulated data often fail to generalize to the real world. To address this shortcoming, prior work introduced domain adaptation algorithms that attempt to make the resulting models domain-invariant. However, such works were evaluated primarily on offline image classification datasets. In this work, we adapt these techniques for learning, primarily in simulation, robotic hand-eye coordination for grasping. Our approaches generalize to diverse and previously unseen real-world objects. We show that, by using synthetic data and domain adaptation, we are able to reduce the amounts of real--world samples required for our goal and a certain level of performance by up to 50 times. We also show that by using our suggested methodology we are able to achieve good grasping results by using no real world labeled data. View details
    Can Deep Reinforcement Learning Solve Erdos-Selfridge-Spencer Games?
    Maithra Raghu
    Jacob Andreas
    Robert Kleinberg
    Jon Kleinberg
    ICML (2018)
    Preview abstract Deep reinforcement learning has seen many remarkable successes over the past few years. However, progress is hindered by challenges faced in reinforcement learning, such as large variability in performance, catastrophic forgetting, and overfitting to particular states. We propose Erdos-Selfridge-Spencer games as a reinforcement learning testbed. We focus in particular on one of the best-known games in this genre, Spencer’s attacker-defender game, also known as the “tenure game”. This game has several nice properties: it is (i) a low-dimensional, simply parametrized environment where (ii) there is a linear closed form solution for optimal behavior from any state, and (iii) the difficulty of the game can be tuned by changing environment parameters in an interpretable way. We compare several RL methods to the tenure game, examining their performance given varying environment difficulty and their generalization to environments outside the training set. View details
    QT-Opt: Scalable Deep Reinforcement Learning for Vision-Based Robotic Manipulation
    Dmitry Kalashnikov
    Peter Pastor Sampedro
    Julian Ibarz
    Alexander Herzog
    Eric Jang
    Deirdre Quillen
    Ethan Holly
    Mrinal Kalakrishnan
    CORL (2018)
    Preview abstract In this paper, we study the problem of learning vision-based dynamic manipulation skills using a scalable reinforcement learning approach. We study this problem in the context of grasping, a longstanding challenge in robotic manipulation. In contrast to static learning behaviors that choose a grasp point and then execute the desired grasp, our method enables closed-loop vision-based control, whereby the robot continuously updates its grasp strategy based on the most recent observations to optimize long-horizon grasp success. To that end, we introduce QT-Opt, a scalable self-supervised vision-based reinforcement learning framework that can leverage over 580k real-world grasp attempts to train a deep neural network Q-function with over 1.2M parameters to perform closed-loop, real-world grasping that generalizes to 96% grasp success on unseen objects. Aside from attaining a very high success rate, our method exhibits behaviors that are quite distinct from more standard grasping systems: using only RGB vision-based perception from an over-the-shoulder camera, our method automatically learns regrasping strategies, probes objects to find the most effective grasps, learns to reposition objects and perform other non-prehensile pre-grasp manipulations, and responds dynamically to disturbances and perturbations. Supplementary experiment videos can be found at https://goo.gl/wQrYmc View details
    Learning Hierarchical Information Flow with Recurrent Neural Modules
    Danijar Hafner
    James Davidson
    Nicolas Heess
    NIPS (2017)
    Preview abstract We propose a deep learning model inspired by neocortical communication via the thalamus. Our model consists of recurrent neural modules that send features via a routing center, endowing the modules with the flexibility to share features over multiple time steps. We show that our model learns to route information hierarchically, processing input data by a chain of modules. We observe common architectures, such as feed forward neural networks and skip connections, emerging as special cases of our architecture, while novel connectivity patterns are learned for the text8 compression task. We demonstrate that our model outperforms standard recurrent neural networks on three sequential benchmarks. View details
    No Results Found