Adversarial Motion Priors Make Good Substitutes for Complex Reward Functions

Ale Escontrela; Atil Iscen; Jason Peng; Ken Goldberg; Pieter Abbeel; Tingnan Zhang; Wenhao Yu

Adversarial Motion Priors Make Good Substitutes for Complex Reward Functions

Ale Escontrela

Atil Iscen

Jason Peng

Ken Goldberg

Pieter Abbeel

Tingnan Zhang

Wenhao Yu

2022 IEEE/RSJ International Conference on Intelligent Robots and Systems, IROS (2022) (to appear)

Google Scholar

Abstract

Training high-dimensional simulated agents with under-specified reward functions often leads to jerky and unnatural behaviors, which results in physically infeasible strategies that are generally ineffective when deployed in the real world. To mitigate these unnatural behaviors, reinforcement learning (RL) practitioners often utilize complex reward functions that encourage more physically plausible behaviors, in conjunction with tricks such as domain randomization to train policies that satisfy the user's style criteria and can be successfully deployed on real robots. Such an approach has been successful in the realm of legged locomotion, leading to state-of-the-art results. However, designing effective reward functions can be a labour-intensive and tedious tuning process, and these hand-designed rewards do not easily generalize across platforms and tasks. We propose substituting complex reward functions with "style rewards" learned from a dataset of motion capture demonstrations. This learned style reward can be combined with a simple task reward to train policies that perform tasks using naturalistic strategies. These more natural strategies can also facilitate transfer to the real world. We build upon prior work in computer graphics and demonstrate that an adversarial approach to training control policies can produce behaviors that transfer to a real quadrupedal robot without requiring complex reward functions. We also demonstrate that an effective style reward can be learned from a few seconds of motion capture data gathered from a German Shepherd and leads to energy-efficient locomotion strategies with natural gait transitions.

Research Areas

Machine intelligence

Explore our many areas of focus

Building a collaborative ecosystem

Shaping the future together

Translating discovery into real-world impact

Adversarial Motion Priors Make Good Substitutes for Complex Reward Functions

Abstract

Research Areas

Meet the teams driving innovation

Google AI

Google Cloud

Google DeepMind

Google Labs