MICo: Improved representations via sampling-based state similarity for Markov decision processes

Mark Rowland; Pablo Samuel Castro; Prakash Panangaden; Tyler Kastner

MICo: Improved representations via sampling-based state similarity for Markov decision processes

Mark Rowland

Pablo Samuel Castro

Prakash Panangaden

Tyler Kastner

NeurIPS 2021 (2021)

Google Scholar

Abstract

We present a new behavioural distance over the state space of a Markov decision process, and demonstrate the use of this distance as an effective means of shaping the learnt representations of deep reinforcement learning agents. While existing notions of state similarity are typically difficult to learn at scale due to high computational cost and lack of sample-based algorithms, our newly-proposed distance addresses both of these issues. In addition to providing detailed theoretical analysis, we provide empirical evidence that learning this distance alongside the value function yields structured and informative representations, including strong results on the Arcade Learning Environment benchmark.

Research Areas

Machine intelligence

Explore our many areas of focus

Building a collaborative ecosystem

Shaping the future together

Translating discovery into real-world impact

MICo: Improved representations via sampling-based state similarity for Markov decision processes

Abstract

Research Areas

Meet the teams driving innovation

Google AI

Google Cloud

Google DeepMind

Google Labs