Primal Wasserstein Imitation Learning

Léonard Hussenot
Matthieu Geist
Olivier Pietquin
Google Scholar


Imitation Learning (IL) methods seek to match the behaviour of an expert with an agent. In the present work, we propose a new IL method based on a conceptually simple algorithm: \textit{PWIL}, which ties to the primal form of the Wasserstein distance. We present a reward function which is derived offline, as opposed to recent adversarial IL that learn a reward function through interactions with the environment. We show that we can recover expert behaviour on a variety of continuous control tasks of the MuJoCo domain in a sample efficient manner in terms of environment interactions and expert interactions. Finally, we show that the behaviour of the agent we train matches the behaviour of the expert with a distance, rather than the commonly used proxy of performance.