Jump to Content

Scalable methods for computing state similarity in deterministic Markov Decision Processes

Proceedings of the Thirty-Fourth AAAI Conference on Artificial Intelligence (AAAI-20) (2020)
Google Scholar

Abstract

We present new algorithms for computing and approximating bisimulation metrics in Markov Decision Processes (MDPs). Bisimulation metrics are an elegant way to capture behavioral equivalence between states which provide strong theoretical guarantees. Unfortunately, their computation is expensive and requires a tabular representation of the states; this has so far rendered them impractical for large problems. In this paper we present two new algorithms for approximating bisimulation metrics in deterministic MDPs. The first does so via sampling and is guaranteed to converge to the true metric. The second is a differentiable loss which allows us to learn an approximation, even for continuous state MDPs, which prior to this work has not been possible. The methods we introduce enable the use of bisimulation metrics in problems of much larger scale than what was previously possible.

Research Areas