Offline Reinforcement Learning with Pseudometric Learning

Robert Dadashi; Shideh Rezaeifar; Nino Vieillard; Léonard Hussenot; Olivier Pietquin; Matthieu Geist

Offline Reinforcement Learning with Pseudometric Learning

Robert Dadashi

Shideh Rezaeifar

Nino Vieillard

Léonard Hussenot

Olivier Pietquin

Matthieu Geist

ICML (2021)

Google Scholar

Abstract

Offline Reinforcement Learning methods seek to learn a policy from logged transitions of an environment, without any interaction. In the presence of function approximation, and under the assumption of limited coverage of the state-action space of the environment, it is necessary to enforce the policy to visit state-action pairs "close" to the support of logged transitions. In this work, we propose an iterative procedure to learn a pseudometric from logged transitions, and use it to define this notion of closeness. We show its convergence guarantees and extend it to the sampled function approximation setting. We then use this pseudometric to define a new look-up based malus in an actor-critic algorithm: this encourages the actor to stay close, in terms of the defined pseudometric, to the support of logged transitions. Finally, we evaluate the method against hand manipulation and locomotion tasks.

Research Areas

Machine intelligence

Explore our many areas of focus

Building a collaborative ecosystem

Shaping the future together

Translating discovery into real-world impact

Offline Reinforcement Learning with Pseudometric Learning

Abstract

Research Areas

Meet the teams driving innovation

Google AI

Google Cloud

Google DeepMind

Google Labs