Planning in Hierarchical Reinforcement Learning: Guarantees for Using Local Policies

Tom Zahavy
Haim Kaplan
ALT (2020)

Abstract

In this work we provide theoretical guarantees
for reward decomposition in deterministic MDPs.
Reward decomposition is a special case of Hierarchical
Reinforcement Learning, that allows one to
learn many policies in parallel and combine them
into a composite solution. Our approach builds on
mapping this problem into a Reward Discounted
Traveling Salesman Problem, and then deriving
approximate solutions for it. In particular, we focus
on approximate solutions that are local, i.e.,
solutions that only observe information about the
current state. Local policies are easy to implement
and do not require many computational resources
as they do not perform planning. While local deterministic
policies, like Nearest Neighbor, are
being used in practice for hierarchical reinforcement
learning, we propose three stochastic policies
that guarantee better performance than any
deterministic policy

Research Areas