Primal Wasserstein Imitation Learning
Abstract
Imitation Learning (IL) methods seek to match the behaviour of an expert with an agent. In the present work, we propose a new IL method based on a conceptually simple algorithm: \textit{PWIL}, which ties to the primal form of the Wasserstein distance. We present a reward function which is derived offline, as opposed to recent adversarial IL that learn a reward function through interactions with the environment. We show that we can recover expert behaviour on a variety of continuous control tasks of the MuJoCo domain in a sample efficient manner in terms of environment interactions and expert interactions. Finally, we show that the behaviour of the agent we train matches the behaviour of the expert with a distance, rather than the commonly used proxy of performance.