Hyperparameter Selection for Imitation Learning

Léonard Hussenot
Marcin Andrychowicz
Damien Vincent
Lukasz Piotr Stafiniak
Sertan Girgin
Nikola M Momchev
Manu Orsini
Matthieu Geist
Olivier Pietquin
ICML (2021)

Abstract

We address the issue of tuning hyperparameters (HPs) for imitation learning algorithms when the underlying reward function of the demonstrating expert cannot be observed at any time. The vast literature in imitation learning mostly considers this reward function to be available for HP selection, although this is not a realistic setting. Indeed, would this reward function be available, it should then directly be used for policy training and imitation would not make sense. To tackle this mostly ignored problem, we propose and study, for different representative agents and benchmarks, a number of possible proxies to the return, within an extensive empirical study. We observe that, depending on the algorithm and the environment, some methods allow good performance to be achieved without using the unknown return.

Research Areas