Google Research

Oracle-Efficient Reinforcement Learning in Factored MDPs with Unknown Structure

NeurIPS 2021 (2021) (to appear)

Abstract

We study provably-efficient reinforcement learning in non-episodic factored Markov decision processes (FMDPs). All previous regret minimization algorithms in this setting made the strong assumption that the factored structure of the FMDP is known to the learner in advance. In this paper, we provide the first algorithm that learns the structure of the FMDP while minimizing the regret. Our algorithm is based on the optimism in face of uncertainty principle, combined with a simple statistical method for structure learning, and can be implemented efficiently given oracle-access to an FMDP planner. In addition, we give a variant of our algorithm that remains efficient even when the oracle is limited to non-factored actions, which is the case with almost all existing approximate planners. Finally, we also provide a novel lower bound for the known structure case that matches the best known regret bound of \citet{chen2020efficient}.

Research Areas

Learn more about how we do research

We maintain a portfolio of research projects, providing individuals and teams the freedom to emphasize specific types of work