Jie Tan

Jie Tan

I joined the Brain team at Google in 2016, working on deep learning, reinforcement learning and robotics. Before that, I was a Member of Technical Staff at the Computational Imaging group at Lytro, working on computer vision, SLAM, light field technology and image processing. I got my PhD of computer science from Georgia Tech in 2015, under the supervision of Greg Turk and Karen Liu.

My research focused on developing computational tools to understand, simulate and control human and animal motions in a complex environment. I developed fast and stable computer programs to simulate complex dynamic systems, such as fluid, soft body and articulated rigid bodies. I applied optimal control and machine learning techniques to enable computers to automatically learn skills inside a complex physical environment.

I am also interested in transferring the control policies that are learned in simulations to real robots. Policies learned in a simulation usually perform poorly on real robots due to the discrepancies between the simulated and the real system. I am developing tools to understand and model such discrepancies. I augmented the physical simulation using real-world data, which not only increases the simulation accuracy, but also improves the real-world performance of the controllers.
Authored Publications
Sort By
  • Title
  • Title, descending
  • Year
  • Year, descending
    Preview abstract Evolution Strategy (ES) algorithms have shown promising results in training complex robotic control policies due to their massive parallelism capability, simple implementation, effective parameter-space exploration, and fast training time. However, a key limitation of ES is its scalability to large capacity models, including modern neural network architectures. In this work, we develop Predictive Information Augmented Random Search (PI-ARS) to mitigate this limitation by leveraging recent advancements in representation learning to reduce the parameter search space for ES. Namely, PI-ARS combines a gradient-based representation learning technique, Predictive Information (PI), with a gradient-free ES algorithm, Augmented Random Search (ARS), to train policies that can process complex robot sensory inputs and handle highly nonlinear robot dynamics. We evaluate PI-ARS on a set of challenging visual-locomotion tasks where a quadruped robot needs to walk on uneven stepping stones, quincuncial piles, and moving platforms, as well as to complete an indoor navigation task. Across all tasks, PI-ARS demonstrates significantly better learning efficiency and performance compared to the ARS baseline. We further validate our algorithm by demonstrating that the learned policies can successfully transfer to a real quadruped robot, for example, achieving a 100% success rate on the real-world stepping stone environment, dramatically improving prior results achieving 40% success. View details
    Learning Model Predictive Controllers with Real-Time Attention for Real-World Navigation
    Anthony G. Francis
    Dmitry Kalashnikov
    Edward Lee
    Jake Varley
    Leila Takayama
    Mikael Persson
    Peng Xu
    Stephen Tu
    Xuesu Xiao
    Conference on Robot Learning (2022) (to appear)
    Preview abstract Despite decades of research, existing navigation systems still face real-world challenges when being deployed in the wild, e.g., in cluttered home environments or in human-occupied public spaces. To address this, we present a new class of implicit control policies combining the benefits of imitation learning with the robust handling of system constraints of Model Predictive Control (MPC). Our approach, called Performer-MPC, uses a learned cost function parameterized by vision context embeddings provided by Performers---a low-rank implicit-attention Transformer. We jointly train the cost function and construct the controller relying on it, effectively solving end-to-end the corresponding bi-level optimization problem. We show that the resulting policy improves standard MPC performance by leveraging a few expert demonstrations of the desired navigation behavior in different challenging real-world scenarios. Compared with a standard MPC policy, Performer-MPC achieves 40% better goal reached in cluttered environments and 65% better sociability when navigating around humans. View details
    Learning Semantic-Aware Locomotion Skills from Human Demonstration
    Byron Boots
    Xiangyun Meng
    Yuxiang Yang
    Conference on Robot Learning (CoRL) 2022 (2022) (to appear)
    Preview abstract The semantics of the environment, such as the terrain type and property, reveals important information for legged robots to adjust their behaviors. In this work, we present a framework that learns semantics-adaptive gait controllers for quadrupedal robots. To facilitate learning, we separate the problem of gait planning and motor control using a hierarchical framework, which consists of a high-level image-conditioned gait policy and a low-level MPC-based motor controller. In addition, to ensure sample efficiency, we pre-train the perception model with an off-road driving dataset, and extract an embedding for downstream learning. To avoid policy evaluation in the noisy real world, we design a simple interface for human operation and learn from human demonstrations. Our framework learns to adjust the speed and gait of the robot based on terrain semantics, using 40 minutes of human demonstration data. We keep testing the performance of the controller on different trails. At the time of writing, the robot has walked 0.2 miles without failure. View details
    Preview abstract We propose an end-to-end framework to enablemultipurpose assistive mobile robots to autonomously wipetables and clean spills and crumbs. This problem is chal-lenging, as it requires planning wiping actions with uncertainlatent crumbs and spill dynamics over high-dimensional visualobservations, while simultaneously guaranteeing constraintssatisfaction to enable deployment in unstructured environments.To tackle this problem, we first propose a stochastic differentialequation (SDE) to model crumbs and spill dynamics and ab-sorption with the robot wiper. Then, we formulate a stochasticoptimal control for planning wiping actions over visual obser-vations, which we solve using reinforcement learning (RL). Wethen propose a whole-body trajectory optimization formulationto compute joint trajectories to execute wiping actions whileguaranteeing constraints satisfaction. We extensively validateour table wiping approach in simulation and on hardware. View details
    Safe Reinforcement Learning for Legged Locomotion
    Jimmy Yang
    Peter J. Ramadge
    Sehoon Ha
    International Conference on Robotics and Automation (2022) (to appear)
    Preview abstract Designing control policies for legged locomotion is complex due to underactuation and discrete contact dynamics. To deal with this complexity, applying reinforcement learning to learn a control policy in the real world is a promising approach. However, safety is a bottleneck when robots need to learn in the real world. In this paper, we propose a safe reinforcement learning framework that switches between a safe recovery policy and a learner policy. The safe recovery policy takes over the control when the learner policy violates safety constraints, and hands over the control back when there are no future safety violations. We design the safe recovery policy so that it ensures safety of legged locomotion while minimally interfering with the learning process. Furthermore, we theoretically analyze the proposed framework and provide an upper bound on the task performance. We verify the proposed framework in three locomotion tasks on a simulated quadrupedal robot: catwalk, two-leg balance, and pacing. On average, our method achieves 48.6% fewer falls and comparable or better rewards than the baseline methods. View details
    Fast and Efficient Locomotion via Learned Gait Transitions
    Yuxiang Yang
    Erwin Coumans
    Byron Boots
    Conference on Robot Learning (2021)
    Preview abstract We focus on the problem of developing energy efficient controllers for quadrupedal robots. Animals can actively switch gaits at different speeds to lower their energy consumption. In this paper, we devise a hierarchical learning framework, in which distinctive locomotion gaits and natural gait transitions emerge automatically with a simple reward of energy minimization. We use evolutionary strategies (ES) to train a high-level gait policy that specifies gait patterns of each foot, while the low-level convex MPC controller optimizes the motor commands so that the robot can walk at a desired velocity using that gait pattern. We test our learning framework on a quadruped robot and demonstrate automatic gait transitions, from walking to trotting and to fly-trotting, as the robot increases its speed. We show that the learned hierarchical controller consumes much less energy across a wide range of locomotion speed than baseline controllers. View details
    Preview abstract Reinforcement learning provides an effective tool for robots to acquire diverse skills in an automated fashion.For safety and data generation purposes, control policies are often trained in a simulator and later deployed to the target environment, such as a real robot. However, transferring policies across domains is often a manual and tedious process. In order to bridge the gap between domains, it is often necessary to carefully tune and identify the simulator parameters or select the aspects of the simulation environment to randomize. In this paper, we design a novel, adversarial learning algorithm to tackle the transfer problem. We combine a classic, analytical simulator with a differentiable, state-action dependent system identification module that outputs the desired simulator parameters. We then train this hybrid simulator such that the output trajectory distributions are indistinguishable from a target domain collection. The optimized hybrid simulator can refine a sub-optimal policy without any additional target domain data. We show that our approach outperforms the domain-randomization and target-domain refinement baselines on two robots and six difficult dynamic tasks. View details
    Learning to walk on complex terrains with vision
    Ale Escontrela
    Erwin Johan Coumans
    Peng Xu
    Sehoon Ha
    Conference on Robotic Learning (2021)
    Preview abstract Visual feedback is crucial for legged robots to safely and efficiently handle uneven terrains such as stairs. However, effectively training robots to effectively consume high dimensional visual input for locomotion is challenging. In this work, we propose a framework to train a vision-based locomotion controller for quadruped robots to traverse a variety of uneven environments. Our key idea is to model the locomotion controller as a hierarchical structure with a high-level vision policy and a low-level motion controller. The high-level vision policy takes as input the perceived vision inputs as well as robot states and outputs desired foothold placement and base movement of the robot, which is realized by low level motion controller composed of a position controller for swing legs and a MPC-based torque controller for stance legs. We train the vision policy using Deep Reinforcement Learning and demonstrate our approach on a variety of uneven environments such as step-stones, stairs, pillars, and moving platforms. We also deploy our policy on a real quadruped robot to walk over a series of random step-stones. View details
    Preview abstract Imitation learning is a popular approach for training effective visual navigation policies. However, collecting expert demonstrations for a legged robot is less practical because the robot is hard to control, and it walks slowly and cannot run continuously for a long time. In this work, we propose a zero-shot imitation learning framework for training a visual navigation policy on a legged robot from human demonstration (third-person perspective) only, allowing for more cost-effective data collection with better navigation capability. However, imitation learning from third-person perspective demonstrations raises unique challenges. Human demonstrations are captured with different camera perspectives, therefore, we design a feature disentanglement network~(FDN) that extracts perspective-agnostic state features. We reconstruct missing action labels by either building an inverse model of the robot's dynamics in the feature space and applying it to the demonstrations or developing efficient GUI to label human demonstrations. We take a model-based imitation learning approach for training a visual navigation policy from the perspective-agnostic, action-labeled demonstrations. We show that our framework can learn an effective visual navigation policy for a legged robot, Laikago, from expert demonstrations in both simulated and real-world environments. Our approach is zero-shot as the robot never tries to navigate a certain navigation path in the testing environment before the testing phase. We also justify our framework by performing an ablation study and comparing it with baseline algorithms. View details
    Rapidly Adaptable Legged Robots via Evolutionary Meta-Learning
    Yuxiang Yang
    Wenbo Gao
    Chelsea Finn
    International Conference on Intelligent Robots and Systems (IROS) (2020) (to appear)
    Preview abstract Learning adaptable policies is crucial for robots to operate autonomously in our complex and quickly changing world. In this work, we present a new meta-learning method that allows robots to quickly adapt to changes in dynamics. In contrast to gradient-based meta-learning algorithms that rely on second-order gradient estimation, we introduce a more noise-tolerant Batch Hill-Climbing adaptation operator and combine it with meta-learning based on evolutionary strategies. Our method significantly improves adaptation to changes in dynamics in high noise settings, which are common in robotics applications. We validate our approach on a quadruped robot that learns to walk while subject to changes in dynamics. We observe that our method significantly outperforms prior gradient-based approaches, enabling the robot to adapt its policy to changes based on less than 3 minutes of real data. View details