Provably Efficient Maximum Entropy Exploration

Elad Hazan; Sham Kakade; Karan Singh; Abby Van Soest

Provably Efficient Maximum Entropy Exploration

Elad Hazan

Sham Kakade

Karan Singh

Abby Van Soest

ICML (2019)

Download Google Scholar

Abstract

Suppose an agent is in a (possibly unknown) Markov Decision Process in the absence of a reward signal, what might we hope that an agent can efficiently learn to do? One natural, intrinsically defined, objective problem is for the agent to learn a policy which induces a distribution over state space that is as uniform as possible, which can be measured in an entropic sense. We provide an efficient algorithm to construct such a maximum-entropy exploratory policy, when given access to a black box planning oracle (which is robust to function approximation). Furthermore, when restricted to the tabular setting where we have sample based access to the MDP, our proposed algorithm is provably efficient method, both in terms of sample size and computational complexity. Key to our algorithmic methodology is utilizing the conditional gradient method (a.k.a. the Frank-Wolfe algorithm) which utilizes an approximate MDP solver.

Explore our many areas of focus

Building a collaborative ecosystem

Shaping the future together

Translating discovery into real-world impact

Provably Efficient Maximum Entropy Exploration

Abstract

Research Areas

Meet the teams driving innovation

Google AI

Google Cloud

Google DeepMind

Google Labs