Yunfei Bai
Yunfei Bai lead Machine Learning infrastructure, data and delivery efforts in Project Starline at Google. Before that, he founded the simulation team for Everyday Robots Project at X (formerly Google [x]). He built and managed cross geolocation teams with software engineers, researchers, and technical artists. He led the team to collaborate with Google Brain and DeepMind on a dozen research projects, including Sim2Real, and PaLM-SayCan.
His research interests include deep learning, computer graphics, 3D computer vision, and robotics.
He received a Ph.D. degree in Computer Science from Georgia Institute of Technology in 2015, under the advice of Dr. C. Karen Liu. His thesis focuses on designing algorithms for synthesizing human motion of object manipulation. He was a member of Computer Graphics Lab in Georgia Tech.
He received a B.E. degree from Tsinghua University, China in 2010.
Research Areas
Authored Publications
Sort By
Scalable Multi-Sensor Robot Imitation Learning via Task-Level Domain Consistency
Armando Fuentes
Daniel Ho
Eric Victor Jang
Matt Bennice
Mohi Khansari
Nicolas Sievers
Yuqing Du
ICRA (2023) (to appear)
Preview abstract
Recent work in visual end-to-end learning for robotics has shown the promise of imitation learning across a variety of tasks. However, such approaches are often expensive and require vast amounts of real world training demonstrations. Additionally, they rely on a time-consuming evaluation process for identifying the best model to deploy in the real world. These challenges can be mitigated by simulation - by supplementing real world data with simulated demonstrations and using simulated evaluations to identify strong policies. However, this introduces the well-known ``reality gap'' problem, where simulator inaccuracies decorrelates performance in simulation from reality. In this paper, we build on top of prior work in GAN-based domain adaptation and introduce the notion of a Task Consistency Loss (TCL), a self-supervised contrastive loss that encourages sim and real alignment both at the feature and action-prediction level. We demonstrate the effectiveness of our approach on the challenging task of latched-door opening with a 9 Degree-of-Freedom (DoF) mobile manipulator from raw RGB and depth images. While most prior work in vision-based manipulation operate from a fixed, third person view, mobile manipulation couples the challenges of locomotion and manipulation with greater visual diversity and action space complexity. We find that we are able to achieve 77% success on seen and unseen scenes, a +30% increase from the baseline, using only ~16 hours of teleoperation demonstrations in sim and real.
View details
Deep RL at Scale: Sorting Waste in Office Buildings with a Fleet of Mobile Manipulators
Jarek Rettinghouse
Daniel Ho
Julian Ibarz
Sangeetha Ramesh
Matt Bennice
Alexander Herzog
Chuyuan Kelly Fu
Adrian Li
Kim Kleiven
Jeff Bingham
Yevgen Chebotar
David Rendleman
Wenlong Lu
Mohi Khansari
Mrinal Kalakrishnan
Ying Xu
Noah Brown
Khem Holden
Justin Vincent
Peter Pastor Sampedro
Jessica Lin
David Dovo
Daniel Kappler
Mengyuan Yan
Sergey Levine
Jessica Lam
Jonathan Weisz
Paul Wohlhart
Karol Hausman
Cameron Lee
Bob Wei
Yao Lu
Preview abstract
We describe a system for deep reinforcement learning of robotic manipulation skills applied to a large-scale real-world task: sorting recyclables and trash in office buildings. Real-world deployment of deep RL policies requires not only effective training algorithms, but the ability to bootstrap real-world training and enable broad generalization. To this end, our system combines scalable deep RL from real-world data with bootstrapping from training in simulation, and incorporates auxiliary inputs from existing computer vision systems as a way to boost generalization to novel objects, while retaining the benefits of end-to-end training. We analyze the tradeoffs of different design decisions in our system, and present a large-scale empirical validation that includes training on real-world data gathered over the course of 24 months of experimentation, across a fleet of 23 robots in three office buildings, with a total training set of 9527 hours of robotic experience. Our final validation also consists of 4800 evaluation trials across 240 waste station configurations, in order to evaluate in detail the impact of the design decisions in our system, the scaling effects of including more real-world data, and the performance of the method on novel objects.
View details
Preview abstract
Reinforcement learning provides an effective tool for robots to acquire diverse skills in an automated fashion.For safety and data generation purposes, control policies are often trained in a simulator and later deployed to the target environment, such as a real robot. However, transferring policies across domains is often a manual and tedious process. In order to bridge the gap between domains, it is often necessary to carefully tune and identify the simulator parameters or select the aspects of the simulation environment to randomize. In this paper, we design a novel, adversarial learning algorithm to tackle the transfer problem. We combine a classic, analytical simulator with a differentiable, state-action dependent system identification module that outputs the desired simulator parameters. We then train this hybrid simulator such that the output trajectory distributions are indistinguishable from a target domain collection. The optimized hybrid simulator can refine a sub-optimal policy without any additional target domain data. We show that our approach outperforms the domain-randomization and target-domain refinement baselines on two robots and six difficult dynamic tasks.
View details
Preview abstract
We propose a self-supervised approach for learning representations of objects from monocular videos and demonstrate it is particularly useful in situated settings such as robotics. The main contributions of this paper are: 1) a self-supervising objective trained with contrastive learning that can discover and disentangle object attributes from video without using any labels; 2) we leverage object self-supervision for online adaptation: the longer our online model looks at objects in a video, the lower the object identification error, while the offline baseline remains with a large fixed error; 3) to explore the possibilities of a system entirely free of human supervision, we let a robot collect its own data, train on this data with our self-supervise scheme, and then show the robot can point to objects similar to the one presented in front of it, demonstrating generalization of object attributes. An interesting and perhaps surprising finding of this approach is that given a limited set of objects, object correspondences will naturally emerge when using contrastive learning without requiring explicit positive pairs. Videos illustrating online object adaptation and robotic pointing are available at this address: https://sites.google.com/view/object-contrastive-networks/home
View details
Learning Fast Adaptation with Meta Strategy Optimization
Erwin Johan Coumans
Sehoon Ha
Learning Fast Adaptation with Meta Strategy Optimization (2020)
Preview abstract
The ability to walk in new situations is a key milestone on the path toward real-world applications of legged robots. In this work, we introduce a novel algorithm for training locomotion policies for legged robots that can quickly adapt to new scenarios with a handful of trials in the target environment. We extend the framework of strategy optimization that trains a control policy with additional latent parameters in the simulation and transfers to the real robot by optimizing the latent inputs. The key idea in our proposed algorithm, Meta Strategy Optimization (MSO), is to formulate the problem as a meta-learning process by exposing the same strategy optimization to both the training and testing phases. This change allows MSO to effectively learn locomotion skills as well as a latent space that is suitable for fast adaptation. We evaluate our method on a real quadruped robot and demonstrate successful adaptation in various scenarios, including sim-to-real transfer, walking with a weakened motor, or climbing up a slope. Furthermore, we analyze the generalization capability of the trained policy in simulated environments and show that our method outperforms previous methods in both simulated and real environments.
View details
Learning 6-DOF Grasping Interaction via Deep 3D Geometry-aware Representations
Xinchen Yan
Mohi Khansari
Abhinav Gupta
James Davidson
Honglak Lee
(2018)
Preview abstract
This paper focuses on the problem of learning 6-DOF grasping with a parallel jaw gripper in simulation. Compared to existing approaches that are specialized in three-dimensional grasping (i.e., top-down grasping or side-grasping), using a 6-DOF grasping model allows the robot to learn a richer set of grasping interactions given less physical constraints; hence, potentially enhancing the robustness of grasping and robot dexterity. However, learning 6-DOF grasping is challenging due to a high dimensional state space, difficulty in collecting large-scale data, and many variations of an object’s visual appearance (i.e., geometry, material, texture, and illumination). We propose the notion of a geometry-aware representation in grasping based on the assumption that knowledge of 3D geometry is at the heart of interaction. Our key idea is constraining and regularizing grasping interaction learning through 3D geometry prediction.
View details
Sim-to-Real: Learning Agile Locomotion For Quadruped Robots
Erwin Coumans
Danijar Hafner
Steven Bohez
RSS (2018)
Preview abstract
Designing agile locomotion for quadruped robots often requires extensive expertise and tedious manual tuning. In this paper, we present a system to automate this process by leveraging deep reinforcement learning techniques. Our system can learn quadruped locomotion from scratch with simple reward signals. In addition, users can provide an open loop reference to guide the learning process if more control over the learned gait is needed. The control policies are learned in a physical simulator and then deployed to real robots. In robotics, policies trained in simulation often does not transfer to the real world. We narrow this reality gap by improving the physical simulator and learning robust policies. We improve the simulation using system identification, developing an accurate actuator model and simulating latency. We learn robust controllers by randomizing the physical environments, adding perturbations and designing a compact observation space. We evaluate our system on two agile locomotion gaits: trotting and galloping. After learning in simulation, a quadruped robot can successfully perform both gaits in real world.
View details
Using Simulation and Domain Adaptation to Improve Efficiency of Deep Robotic Grasping
Paul Wohlhart
Matthew Kelcey
Mrinal Kalakrishnan
Laura Downs
Julian Ibarz
Peter Pastor Sampedro
Kurt Konolige
Sergey Levine
ICRA (2018)
Preview abstract
Instrumenting and collecting annotated visual grasping datasets to train modern machine learning algorithms is prohibitively expensive. An appealing alternative is to use off-the-shelf simulators to render synthetic data for which ground-truth annotations are generated automatically.
Unfortunately, models trained purely on simulated data often fail to generalize to the real world. To address this shortcoming, prior work introduced domain adaptation algorithms that attempt to make the resulting models domain-invariant. However, such works were evaluated primarily on offline image classification datasets. In this work, we adapt these techniques for learning, primarily in simulation, robotic hand-eye coordination for grasping. Our approaches generalize to diverse and previously unseen real-world objects.
We show that, by using synthetic data and domain adaptation, we are able to reduce the amounts of real--world samples required for our goal and a certain level of performance by up to 50 times. We also show that by using our suggested methodology we are able to achieve good grasping results by using no real world labeled data.
View details