Exploring Nature-Inspired Robot Agility

April 3, 2020

Posted by Xue Bin (Jason) Peng, Student Researcher and Sehoon Ha, Research Scientist, Robotics at Google

Quick links

- ×

Whether it’s a dog chasing after a ball or a horse jumping over obstacles, animals can effortlessly perform an incredibly rich repertoire of agile skills. Developing robots that are able to replicate these agile behaviors can open opportunities to deploy robots for sophisticated tasks in the real world. But designing controllers that enable legged robots to perform these agile behaviors can be a very challenging task. While reinforcement learning (RL) is an approach often used for automating development of robotic skills, a number of technical hurdles remain and, in practice, there is still substantial manual overhead. Designing reward functions that lead to effective skills can itself require a great deal of expert insight, and often involves a lengthy reward tuning process for each desired skill. Furthermore, applying RL to legged robots requires not only efficient algorithms, but also mechanisms to enable the robots to remain safe and recover after falling, without frequent human assistance.

In this post, we will discuss two of our recent projects aimed at addressing these challenges. First, we describe how robots can learn agile behaviors by imitating motions from real animals, producing fast and fluent movements like trotting and hopping. Then, we discuss a system for automating the training of locomotion skills in the real world, which allows robots to learn to walk on their own, with minimal human assistance.

Learning Agile Robotic Locomotion Skills by Imitating Animals
In “Learning Agile Robotic Locomotion Skills by Imitating Animals”, we present a framework that takes a reference motion clip recorded from an animal (a dog, in this case) and uses RL to train a control policy that enables a robot to imitate the motion in the real world. By providing the system with different reference motions, we are able to train a quadruped robot to perform a diverse set of agile behaviors, ranging from fast walking gaits to dynamic hops and turns. The policies are trained primarily in simulation, and then transferred to the real world using a latent space adaptation technique that can efficiently adapt a policy using only a few minutes of data from the real robot.

Motion Imitation
We start by collecting motion capture clips of a real dog performing various locomotion skills. Then, we use RL to train a control policy to imitate the dog’s motions. The policies are trained in a physics simulation to track the pose of the reference motion at each timestep. Then, by using different reference motions in the reward function, we can train a simulated robot to imitate a variety of different skills.

Four side-by-side animated panels compare simulated reference motions against executed robotic policies for pacing and spinning actions.

Reinforcement learning is used to train a simulated robot to imitate the reference motions from a dog. All simulations are performed using PyBullet.

However, since simulators generally provide only a coarse approximation of the real world, policies trained in simulation often perform poorly when deployed on a real robot. Therefore, we use a sample-efficient latent space adaptation technique to transfer a policy trained in simulation to the real world.

First, to encourage the policy to learn behaviors that are robust to variations in the dynamics, we randomize the dynamics of the simulation by varying physical quantities, such as the robot’s mass and friction. Since we have access to the values of these parameters during training in simulation, we can also map them to a low-dimensional representation using a learned encoder. This encoding is then passed as an additional input to the policy during training. Since the physical parameters of the real robot are not known a priori, when deploying the policy to a real robot, we remove the encoder and directly search for a set of parameters in the latent space that enables the robot to successfully execute the desired skills in the real world. This technique is often able to adapt a policy to the real world using less than 8 minutes of real-world data.

Six animated panels contrast simulated quadruped reference motions against actual physical robot performance before and after adaptation.

Comparison of policies before and after adaptation on the real robot. Before adaptation, the robot is prone to falling. But after adaptation, the policies are able to more consistently execute the desired skills.

Results
Using this approach, the robot learns to imitate various locomotion skills from a dog, including different walking gaits, such as pacing and trotting, as well as an agile spinning motion.

Four pairs of animated panels compare simulated quadruped reference motions against physical robot execution across various gaits.

Robot imitating various skills from a dog.

In addition to imitating motions from real dogs, it is also possible to imitate artist-animated keyframe motions, including a dynamic hop-turn:

Four animated panels contrast a simulated quadruped robot performing side-steps and turns against a physical robot replicating those actions.

Quadruped robot smoothly trots across a carpeted laboratory floor tethered by a single trailing power or data cable.

Skills learned by imitating artist-animated keyframe motions: side-steps, turn, and hop-turn.

More details are available in the following video:

Learning to Walk in the Real World with Minimal Human Effort
The above approach is able to train policies in simulation and then adapt them to the real world. However, when the task involves complex and diverse physical phenomena, it is also necessary to directly learn from real-world experience. Although learning on real robots has achieved state-of-the-art performance for manipulation tasks (e.g., QT-Opt), applying the same methods to legged robots is difficult since the robot may fall and damage itself, or leave the training area, which can then require human intervention.

Split screen demonstrating a quadruped robot successfully leaving its workspace alongside a separate instance where it falls over.

An automated learning system for legged robots must resolve safety and automation challenges.

In “Learning to Walk in the Real World with Minimal Human Effort”, we developed an automated learning system with software and hardware components, using a multi-task learning procedure, a safety-constrained learner, and several carefully designed hardware and software components. Multi-task learning prevents the robot from leaving the training area by generating a learning schedule that drives the robot towards the center of the workspace. We also reduce the number of falls by designing a safety constraint, which we solve with dual gradient descent.

For each roll-out, the scheduler selects a task in which the desired walking direction is pointing towards the center. For instance, assuming we have two tasks, forward and backward walking, the scheduler will select the forward task if the robot is at the back of the workspace, and vice-versa for the backward task. In the middle of the episode, the learner takes dual gradient descent steps to iteratively optimize both the task objective and safety constraints, rather than treating them as a single goal. If the robot has fallen, we invoke an automated get-up controller and proceed to the next episode.

Flowchart detailing a multi-instance learning reinforcement loop where a robot rollout failure triggers an automatic reset and replay buffer.

We solve automation and safety challenges with multi-task learning, a safety-constrained SAC algorithm, and an automatic reset controller.

Results
This framework successfully trains policies from scratch to walk in different directions without any human intervention.

Three sequential panels demonstrating a quadruped robot progressing from clumsy initial exploration to confident continuous walking after 72 minutes.

Snapshots of the training process on the flat surface with zero human resets.

Once trained, it is possible to steer the robot with a remote controller. Notice how it's possible to command the robot to turn in place using the controller. This action would be difficult to manually design due to the planar leg structure of the robot, but is discovered automatically using our automated multi-instance learner.

Animated demonstration of a human manually steering a quadruped robot around a laboratory floor using a standard video game controller.

We train locomotion policies to walk in four directions, which allow us to interactively control the robot with a game controller.

The system also enables the robot to navigate more challenging surfaces, such as a memory foam mattress and a doormat with crevices.

Split screen comparing a quadruped robot walking forward on a blue mat against walking backward on a black textured mat.

Learned locomotion gaits on challenging terrains.

More details can be found in the following video:

Conclusion
In these two papers, we present methods to reproduce a diverse corpus of behaviors with quadruped robots. Extending this line of work to learn skills from videos would also be an exciting direction, which can substantially increase the volume of data from which robots can learn. We are also interested in applying the automated training system to more complex real-world environments and tasks.

Acknowledgments
We would like to thank our coauthors, Erwin Coumans, Tingnan Zhang, Tsang-Wei Lee, Jie Tan, Sergey Levine, Peng Xu and Zhenyu Tan. We would also like to thank Julian Ibarz, Byron David, Thinh Nguyen, Gus Kouretas, Krista Reymann, and Bonny Ho for their support and contributions to this work.

Quick links

- ×

Explore our many areas of focus

Building a collaborative ecosystem

Shaping the future together

Translating discovery into real-world impact

Exploring Nature-Inspired Robot Agility

Quick links

Quick links

Google AI

Google Cloud

Google DeepMind

Google Labs

Exploring Nature-Inspired Robot Agility

Quick links

Quick links

Other posts of interest