Jie Tan

Jie Tan

I joined the Brain team at Google in 2016, working on deep learning, reinforcement learning and robotics. Before that, I was a Member of Technical Staff at the Computational Imaging group at Lytro, working on computer vision, SLAM, light field technology and image processing. I got my PhD of computer science from Georgia Tech in 2015, under the supervision of Greg Turk and Karen Liu.

My research focused on developing computational tools to understand, simulate and control human and animal motions in a complex environment. I developed fast and stable computer programs to simulate complex dynamic systems, such as fluid, soft body and articulated rigid bodies. I applied optimal control and machine learning techniques to enable computers to automatically learn skills inside a complex physical environment.

I am also interested in transferring the control policies that are learned in simulations to real robots. Policies learned in a simulation usually perform poorly on real robots due to the discrepancies between the simulated and the real system. I am developing tools to understand and model such discrepancies. I augmented the physical simulation using real-world data, which not only increases the simulation accuracy, but also improves the real-world performance of the controllers.
Authored Publications
Sort By
  • Title
  • Title, descending
  • Year
  • Year, descending
    Learning Semantic-Aware Locomotion Skills from Human Demonstration
    Byron Boots
    Xiangyun Meng
    Yuxiang Yang
    Conference on Robot Learning (CoRL) 2022(2022) (to appear)
    Preview abstract The semantics of the environment, such as the terrain type and property, reveals important information for legged robots to adjust their behaviors. In this work, we present a framework that learns semantics-adaptive gait controllers for quadrupedal robots. To facilitate learning, we separate the problem of gait planning and motor control using a hierarchical framework, which consists of a high-level image-conditioned gait policy and a low-level MPC-based motor controller. In addition, to ensure sample efficiency, we pre-train the perception model with an off-road driving dataset, and extract an embedding for downstream learning. To avoid policy evaluation in the noisy real world, we design a simple interface for human operation and learn from human demonstrations. Our framework learns to adjust the speed and gait of the robot based on terrain semantics, using 40 minutes of human demonstration data. We keep testing the performance of the controller on different trails. At the time of writing, the robot has walked 0.2 miles without failure. View details
    Safe Reinforcement Learning for Legged Locomotion
    Jimmy Yang
    Peter J. Ramadge
    Sehoon Ha
    International Conference on Robotics and Automation(2022) (to appear)
    Preview abstract Designing control policies for legged locomotion is complex due to underactuation and discrete contact dynamics. To deal with this complexity, applying reinforcement learning to learn a control policy in the real world is a promising approach. However, safety is a bottleneck when robots need to learn in the real world. In this paper, we propose a safe reinforcement learning framework that switches between a safe recovery policy and a learner policy. The safe recovery policy takes over the control when the learner policy violates safety constraints, and hands over the control back when there are no future safety violations. We design the safe recovery policy so that it ensures safety of legged locomotion while minimally interfering with the learning process. Furthermore, we theoretically analyze the proposed framework and provide an upper bound on the task performance. We verify the proposed framework in three locomotion tasks on a simulated quadrupedal robot: catwalk, two-leg balance, and pacing. On average, our method achieves 48.6% fewer falls and comparable or better rewards than the baseline methods. View details
    Preview abstract Evolution Strategy (ES) algorithms have shown promising results in training complex robotic control policies due to their massive parallelism capability, simple implementation, effective parameter-space exploration, and fast training time. However, a key limitation of ES is its scalability to large capacity models, including modern neural network architectures. In this work, we develop Predictive Information Augmented Random Search (PI-ARS) to mitigate this limitation by leveraging recent advancements in representation learning to reduce the parameter search space for ES. Namely, PI-ARS combines a gradient-based representation learning technique, Predictive Information (PI), with a gradient-free ES algorithm, Augmented Random Search (ARS), to train policies that can process complex robot sensory inputs and handle highly nonlinear robot dynamics. We evaluate PI-ARS on a set of challenging visual-locomotion tasks where a quadruped robot needs to walk on uneven stepping stones, quincuncial piles, and moving platforms, as well as to complete an indoor navigation task. Across all tasks, PI-ARS demonstrates significantly better learning efficiency and performance compared to the ARS baseline. We further validate our algorithm by demonstrating that the learned policies can successfully transfer to a real quadruped robot, for example, achieving a 100% success rate on the real-world stepping stone environment, dramatically improving prior results achieving 40% success. View details
    Preview abstract We propose an end-to-end framework to enablemultipurpose assistive mobile robots to autonomously wipetables and clean spills and crumbs. This problem is chal-lenging, as it requires planning wiping actions with uncertainlatent crumbs and spill dynamics over high-dimensional visualobservations, while simultaneously guaranteeing constraintssatisfaction to enable deployment in unstructured environments.To tackle this problem, we first propose a stochastic differentialequation (SDE) to model crumbs and spill dynamics and ab-sorption with the robot wiper. Then, we formulate a stochasticoptimal control for planning wiping actions over visual obser-vations, which we solve using reinforcement learning (RL). Wethen propose a whole-body trajectory optimization formulationto compute joint trajectories to execute wiping actions whileguaranteeing constraints satisfaction. We extensively validateour table wiping approach in simulation and on hardware. View details
    Learning Model Predictive Controllers with Real-Time Attention for Real-World Navigation
    Anthony G. Francis
    Dmitry Kalashnikov
    Edward Lee
    Jake Varley
    Leila Takayama
    Mikael Persson
    Peng Xu
    Stephen Tu
    Xuesu Xiao
    Conference on Robot Learning(2022) (to appear)
    Preview abstract Despite decades of research, existing navigation systems still face real-world challenges when being deployed in the wild, e.g., in cluttered home environments or in human-occupied public spaces. To address this, we present a new class of implicit control policies combining the benefits of imitation learning with the robust handling of system constraints of Model Predictive Control (MPC). Our approach, called Performer-MPC, uses a learned cost function parameterized by vision context embeddings provided by Performers---a low-rank implicit-attention Transformer. We jointly train the cost function and construct the controller relying on it, effectively solving end-to-end the corresponding bi-level optimization problem. We show that the resulting policy improves standard MPC performance by leveraging a few expert demonstrations of the desired navigation behavior in different challenging real-world scenarios. Compared with a standard MPC policy, Performer-MPC achieves 40% better goal reached in cluttered environments and 65% better sociability when navigating around humans. View details
    Learning to walk on complex terrains with vision
    Ale Escontrela
    Erwin Johan Coumans
    Peng Xu
    Sehoon Ha
    Conference on Robotic Learning(2021)
    Preview abstract Visual feedback is crucial for legged robots to safely and efficiently handle uneven terrains such as stairs. However, effectively training robots to effectively consume high dimensional visual input for locomotion is challenging. In this work, we propose a framework to train a vision-based locomotion controller for quadruped robots to traverse a variety of uneven environments. Our key idea is to model the locomotion controller as a hierarchical structure with a high-level vision policy and a low-level motion controller. The high-level vision policy takes as input the perceived vision inputs as well as robot states and outputs desired foothold placement and base movement of the robot, which is realized by low level motion controller composed of a position controller for swing legs and a MPC-based torque controller for stance legs. We train the vision policy using Deep Reinforcement Learning and demonstrate our approach on a variety of uneven environments such as step-stones, stairs, pillars, and moving platforms. We also deploy our policy on a real quadruped robot to walk over a series of random step-stones. View details
    Preview abstract Reinforcement learning provides an effective tool for robots to acquire diverse skills in an automated fashion.For safety and data generation purposes, control policies are often trained in a simulator and later deployed to the target environment, such as a real robot. However, transferring policies across domains is often a manual and tedious process. In order to bridge the gap between domains, it is often necessary to carefully tune and identify the simulator parameters or select the aspects of the simulation environment to randomize. In this paper, we design a novel, adversarial learning algorithm to tackle the transfer problem. We combine a classic, analytical simulator with a differentiable, state-action dependent system identification module that outputs the desired simulator parameters. We then train this hybrid simulator such that the output trajectory distributions are indistinguishable from a target domain collection. The optimized hybrid simulator can refine a sub-optimal policy without any additional target domain data. We show that our approach outperforms the domain-randomization and target-domain refinement baselines on two robots and six difficult dynamic tasks. View details
    Fast and Efficient Locomotion via Learned Gait Transitions
    Yuxiang Yang
    Erwin Coumans
    Byron Boots
    Conference on Robot Learning(2021)
    Preview abstract We focus on the problem of developing energy efficient controllers for quadrupedal robots. Animals can actively switch gaits at different speeds to lower their energy consumption. In this paper, we devise a hierarchical learning framework, in which distinctive locomotion gaits and natural gait transitions emerge automatically with a simple reward of energy minimization. We use evolutionary strategies (ES) to train a high-level gait policy that specifies gait patterns of each foot, while the low-level convex MPC controller optimizes the motor commands so that the robot can walk at a desired velocity using that gait pattern. We test our learning framework on a quadruped robot and demonstrate automatic gait transitions, from walking to trotting and to fly-trotting, as the robot increases its speed. We show that the learned hierarchical controller consumes much less energy across a wide range of locomotion speed than baseline controllers. View details
    Model-based Reinforcement Learning for Decentralized Multiagent Rendezvous
    Rose E. Wang
    Dennis Lee
    Edward Lee
    Brian Andrew Ichter
    Conference on Robot Learning (CoRL)(2020)
    Preview abstract Collaboration requires agents to align their goals on the fly. Underlying the human ability to align goals with other agents is their ability to predict the intentions of others and actively update their own plans. We propose hierarchical predictive planning (HPP), a model-based reinforcement learning method for decentralized multiagent rendezvous. Starting with pretrained, single-agent point to point navigation policies and using noisy, high-dimensional sensor inputs like lidar, we first learn via self-supervision motion predictions of all agents on the team. Next, HPP uses the prediction models to propose and evaluate navigation subgoals for completing the rendezvous task without explicit communication among agents. We evaluate HPP in a suite of unseen environments, with increasing complexity and numbers of obstacles. We show that HPP outperforms alternative reinforcement learning, path planning, and heuristic-based baselines on challenging, unseen environments. Experiments in the real world demonstrate successful transfer of the prediction models from sim to real world without any additional fine-tuning. Altogether, HPP removes the need for a centralized operator in multiagent systems by combining model-based RL and inference methods, enabling agents to dynamically align plans. View details
    Learning Fast Adaptation with Meta Strategy Optimization
    Erwin Johan Coumans
    Sehoon Ha
    Learning Fast Adaptation with Meta Strategy Optimization(2020)
    Preview abstract The ability to walk in new situations is a key milestone on the path toward real-world applications of legged robots. In this work, we introduce a novel algorithm for training locomotion policies for legged robots that can quickly adapt to new scenarios with a handful of trials in the target environment. We extend the framework of strategy optimization that trains a control policy with additional latent parameters in the simulation and transfers to the real robot by optimizing the latent inputs. The key idea in our proposed algorithm, Meta Strategy Optimization (MSO), is to formulate the problem as a meta-learning process by exposing the same strategy optimization to both the training and testing phases. This change allows MSO to effectively learn locomotion skills as well as a latent space that is suitable for fast adaptation. We evaluate our method on a real quadruped robot and demonstrate successful adaptation in various scenarios, including sim-to-real transfer, walking with a weakened motor, or climbing up a slope. Furthermore, we analyze the generalization capability of the trained policy in simulated environments and show that our method outperforms previous methods in both simulated and real environments. View details