A Kernel Perspective on Behavioural Metrics for Markov Decision Processes

Pablo Samuel Castro

Tyler Kastner

Prakash Panangaden

Mark Rowland

TMLR (2023)

Download Google Scholar

Abstract

Behavioural metrics have been shown to be an effective mechanism for constructing representations in reinforcement learning. We present a novel perspective on
behavioural metrics for Markov decision processes via the use of positive definite
kernels. We leverage this new perspective to define a new metric that is provably
equivalent to the recently introduced MICo distance (Castro et al., 2021). The kernel perspective further enables us to provide new theoretical results, which has so far
eluded prior work. These include bounding value function differences by means of
our metric, and the demonstration that our metric can be provably embedded into a
finite-dimensional Euclidean space with low distortion error. These are two crucial
properties when using behavioural metrics for reinforcement learning representations. We complement our theory with strong empirical results that demonstrate
the effectiveness of these methods in practice.

Research Areas

Machine intelligence

Explore our many areas of focus

Building a collaborative ecosystem

Shaping the future together

Translating discovery into real-world impact

A Kernel Perspective on Behavioural Metrics for Markov Decision Processes

Abstract

Research Areas

Meet the teams driving innovation

Google AI

Google Cloud

Google DeepMind

Google Labs