Wenhao Yu

Wenhao Yu

I obtained my PhD in Computer Science from Georgia Institute of Technology advised by Karen Liu and Greg Turk. ​My research lies at the intersection of computer graphics, robotics, and machine learning. I develop learning-based algorithms for acquiring motor skills for animated characters and transfer learning algorithms for transferring learned skills from simulation to real robots and novel environments. My research goal is to enable simulated characters and real robots to learn highly-dynamic movements in complex environments in an automatic, efficient, and generalizable way.
Authored Publications
Sort By
  • Title
  • Title, descending
  • Year
  • Year, descending
    Style-Augmented Mutual Information for Practical Skill Discovery
    Ale Escontrela
    Jason Peng
    Ken Goldberg
    Pieter Abbeel
    Proceedings of NeurIPS(2022) (to appear)
    Preview abstract Exploration and skill discovery in many real-world settings is often inspired by the activities we see others perform. However, most unsupervised skill discovery methods tend to focus solely on the intrinsic component of motivation, often by maximizing the Mutual Information (MI) between the agent's skills and the observed trajectories. These skills, though diverse in the behaviors they elicit, leave much to be desired. Namely, skills learned by maximizing MI in a high-dimensional continuous control setting tend to be aesthetically unpleasing and challenging to utilize in a practical setting, as the violent behavior often exhibited by these skills would not transfer well to the real world. We argue that solely maximizing MI is insufficient if we wish to discover useful skills, and that a notion of "style" must be incorporated into the objective. To this end, we propose the Style-Augmented Mutual Information objective (SAMI), whereby - in addition to maximizing a lower-bound of the MI - the agent is encouraged to minimize the f-divergence between the policy-induced trajectory distribution and the trajectory distribution contained in the reference data (the style objective). We compare SAMI to other popular skill discovery objectives, and demonstrate that skill-conditioned policies optimized with SAMI achieve equal or greater performance when applied to downstream tasks. We also show that the data-driven motion prior specified by the style objective can be inferred from various modalities, including large motion capture datasets or even RGB videos. View details
    Safe Reinforcement Learning for Legged Locomotion
    Jimmy Yang
    Peter J. Ramadge
    Sehoon Ha
    International Conference on Robotics and Automation(2022) (to appear)
    Preview abstract Designing control policies for legged locomotion is complex due to underactuation and discrete contact dynamics. To deal with this complexity, applying reinforcement learning to learn a control policy in the real world is a promising approach. However, safety is a bottleneck when robots need to learn in the real world. In this paper, we propose a safe reinforcement learning framework that switches between a safe recovery policy and a learner policy. The safe recovery policy takes over the control when the learner policy violates safety constraints, and hands over the control back when there are no future safety violations. We design the safe recovery policy so that it ensures safety of legged locomotion while minimally interfering with the learning process. Furthermore, we theoretically analyze the proposed framework and provide an upper bound on the task performance. We verify the proposed framework in three locomotion tasks on a simulated quadrupedal robot: catwalk, two-leg balance, and pacing. On average, our method achieves 48.6% fewer falls and comparable or better rewards than the baseline methods. View details
    Preview abstract Evolution Strategy (ES) algorithms have shown promising results in training complex robotic control policies due to their massive parallelism capability, simple implementation, effective parameter-space exploration, and fast training time. However, a key limitation of ES is its scalability to large capacity models, including modern neural network architectures. In this work, we develop Predictive Information Augmented Random Search (PI-ARS) to mitigate this limitation by leveraging recent advancements in representation learning to reduce the parameter search space for ES. Namely, PI-ARS combines a gradient-based representation learning technique, Predictive Information (PI), with a gradient-free ES algorithm, Augmented Random Search (ARS), to train policies that can process complex robot sensory inputs and handle highly nonlinear robot dynamics. We evaluate PI-ARS on a set of challenging visual-locomotion tasks where a quadruped robot needs to walk on uneven stepping stones, quincuncial piles, and moving platforms, as well as to complete an indoor navigation task. Across all tasks, PI-ARS demonstrates significantly better learning efficiency and performance compared to the ARS baseline. We further validate our algorithm by demonstrating that the learned policies can successfully transfer to a real quadruped robot, for example, achieving a 100% success rate on the real-world stepping stone environment, dramatically improving prior results achieving 40% success. View details
    Learning Semantic-Aware Locomotion Skills from Human Demonstration
    Byron Boots
    Xiangyun Meng
    Yuxiang Yang
    Conference on Robot Learning (CoRL) 2022(2022) (to appear)
    Preview abstract The semantics of the environment, such as the terrain type and property, reveals important information for legged robots to adjust their behaviors. In this work, we present a framework that learns semantics-adaptive gait controllers for quadrupedal robots. To facilitate learning, we separate the problem of gait planning and motor control using a hierarchical framework, which consists of a high-level image-conditioned gait policy and a low-level MPC-based motor controller. In addition, to ensure sample efficiency, we pre-train the perception model with an off-road driving dataset, and extract an embedding for downstream learning. To avoid policy evaluation in the noisy real world, we design a simple interface for human operation and learn from human demonstrations. Our framework learns to adjust the speed and gait of the robot based on terrain semantics, using 40 minutes of human demonstration data. We keep testing the performance of the controller on different trails. At the time of writing, the robot has walked 0.2 miles without failure. View details
    Adversarial Motion Priors Make Good Substitutes for Complex Reward Functions
    Ale Escontrela
    Jason Peng
    Ken Goldberg
    Pieter Abbeel
    2022 IEEE/RSJ International Conference on Intelligent Robots and Systems, IROS(2022) (to appear)
    Preview abstract Training high-dimensional simulated agents with under-specified reward functions often leads to jerky and unnatural behaviors, which results in physically infeasible strategies that are generally ineffective when deployed in the real world. To mitigate these unnatural behaviors, reinforcement learning (RL) practitioners often utilize complex reward functions that encourage more physically plausible behaviors, in conjunction with tricks such as domain randomization to train policies that satisfy the user's style criteria and can be successfully deployed on real robots. Such an approach has been successful in the realm of legged locomotion, leading to state-of-the-art results. However, designing effective reward functions can be a labour-intensive and tedious tuning process, and these hand-designed rewards do not easily generalize across platforms and tasks. We propose substituting complex reward functions with "style rewards" learned from a dataset of motion capture demonstrations. This learned style reward can be combined with a simple task reward to train policies that perform tasks using naturalistic strategies. These more natural strategies can also facilitate transfer to the real world. We build upon prior work in computer graphics and demonstrate that an adversarial approach to training control policies can produce behaviors that transfer to a real quadrupedal robot without requiring complex reward functions. We also demonstrate that an effective style reward can be learned from a few seconds of motion capture data gathered from a German Shepherd and leads to energy-efficient locomotion strategies with natural gait transitions. View details
    Zero-Shot Retargeting of Learned Quadruped Locomotion Policy Using A Hybrid Kinodynamic Model and Predictive Control
    He Li
    Patrick Wensing
    2022 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2022)(2022) (to appear)
    Preview abstract As a rivaling control technique, Reinforcement Learning (RL) has demonstrated great performance in quadruped locomotion. However, it remains a challenge to reuse a policy on another robot, i.e., policy transferability, which saves time for retraining. In this work, we reduce the gap by devloping a planning-and-control framework that systematically integrates RL and Model Predictive Control (MPC). The planning stage employs RL to generate a dynamically-plausible trajectory as well as the contact schedule. These information are then used to seed the MPC in the low level to stabilize and robustify the motion. In addition, our MPC controller employs a novel Hybrid Kino-Dynamics (HKD) model which implicitly optimizes the foothold locations. The results are surprisingly good since the policy trained for the Unitree A1 robot could be transferred to the MIT Mini Cheetah with the proposed pipeline. View details
    Learning to walk on complex terrains with vision
    Ale Escontrela
    Erwin Johan Coumans
    Peng Xu
    Sehoon Ha
    Conference on Robotic Learning(2021)
    Preview abstract Visual feedback is crucial for legged robots to safely and efficiently handle uneven terrains such as stairs. However, effectively training robots to effectively consume high dimensional visual input for locomotion is challenging. In this work, we propose a framework to train a vision-based locomotion controller for quadruped robots to traverse a variety of uneven environments. Our key idea is to model the locomotion controller as a hierarchical structure with a high-level vision policy and a low-level motion controller. The high-level vision policy takes as input the perceived vision inputs as well as robot states and outputs desired foothold placement and base movement of the robot, which is realized by low level motion controller composed of a position controller for swing legs and a MPC-based torque controller for stance legs. We train the vision policy using Deep Reinforcement Learning and demonstrate our approach on a variety of uneven environments such as step-stones, stairs, pillars, and moving platforms. We also deploy our policy on a real quadruped robot to walk over a series of random step-stones. View details
    Protective Policy Transfer
    C. Karen Liu
    Greg Turk
    International Conference on Robotics and Automation(2021)
    Preview abstract Being able to transfer existing skills to new situations is a key ability for robots to operate in unpredictable real-world environments. A successful transfer algorithm should not only minimize the number of samples that the robot needs to collect in the new environment, but also prevent the robot from damaging itself or the surrounding environments during the transfer process. In this work, we introduce a policy transfer algorithm for adapting robot motor skills to novel scenarios while minimizing serious failures. Our algorithm trains two control policies in the training environment: a task policy that is optimized to complete the task of interest, and a protective policy that is dedicated to keep the robot from unsafe events (e.g. falling to the ground). To decide which policy to use during execution, we learn a safety estimator model in the training environment that estimates a continuous safety level of the robot. When used with a set of thresholds, the safety estimator becomes a classifier to switch between the protective policy and the task policy. We evaluate our approach on four simulated robot locomotion problems and a 2D navigation problem and show that our method can achieve successful transfer to notably different environments while taking safety into consideration. View details
    Learning Fast Adaptation with Meta Strategy Optimization
    Erwin Johan Coumans
    Sehoon Ha
    Learning Fast Adaptation with Meta Strategy Optimization(2020)
    Preview abstract The ability to walk in new situations is a key milestone on the path toward real-world applications of legged robots. In this work, we introduce a novel algorithm for training locomotion policies for legged robots that can quickly adapt to new scenarios with a handful of trials in the target environment. We extend the framework of strategy optimization that trains a control policy with additional latent parameters in the simulation and transfers to the real robot by optimizing the latent inputs. The key idea in our proposed algorithm, Meta Strategy Optimization (MSO), is to formulate the problem as a meta-learning process by exposing the same strategy optimization to both the training and testing phases. This change allows MSO to effectively learn locomotion skills as well as a latent space that is suitable for fast adaptation. We evaluate our method on a real quadruped robot and demonstrate successful adaptation in various scenarios, including sim-to-real transfer, walking with a weakened motor, or climbing up a slope. Furthermore, we analyze the generalization capability of the trained policy in simulated environments and show that our method outperforms previous methods in both simulated and real environments. View details