Hyperparameter Selection for Imitation Learning

Léonard Hussenot; Marcin Andrychowicz; Damien Vincent; Robert Dadashi; Anton Raichuk; Lukasz Piotr Stafiniak; Sertan Girgin; Raphaël Marinier; Nikola M Momchev; Sabela Ramos; Manu Orsini; Olivier Frederic Bachem; Matthieu Geist; Olivier Pietquin

Hyperparameter Selection for Imitation Learning

Léonard Hussenot

Marcin Andrychowicz

Damien Vincent

Robert Dadashi

Anton Raichuk

Lukasz Piotr Stafiniak

Sertan Girgin

Raphaël Marinier

Nikola M Momchev

Sabela Ramos

Manu Orsini

Olivier Frederic Bachem

Matthieu Geist

Olivier Pietquin

ICML (2021)

Download Google Scholar

Abstract

We address the issue of tuning hyperparameters (HPs) for imitation learning algorithms when the underlying reward function of the demonstrating expert cannot be observed at any time. The vast literature in imitation learning mostly considers this reward function to be available for HP selection, although this is not a realistic setting. Indeed, would this reward function be available, it should then directly be used for policy training and imitation would not make sense. To tackle this mostly ignored problem, we propose and study, for different representative agents and benchmarks, a number of possible proxies to the return, within an extensive empirical study. We observe that, depending on the algorithm and the environment, some methods allow good performance to be achieved without using the unknown return.

Research Areas

Machine intelligence

Explore our many areas of focus

Building a collaborative ecosystem

Shaping the future together

Translating discovery into real-world impact

Hyperparameter Selection for Imitation Learning

Abstract

Research Areas

Meet the teams driving innovation

Google AI

Google Cloud

Google DeepMind

Google Labs