Value Function Spaces: Skill-Centric State Abstractions for Long-Horizon Reasoning

Alexander Toshkov Toshev
Brian Andrew Ichter
Dhruv Shah
Peng Xu
Sergey Levine
Yao Lu
ICLR (2022)
Google Scholar

Abstract

Reinforcement learning can train policies that effectively perform complex tasks. However, the performance of these methods degrades as the horizon increases, and performing long-horizon tasks often requires reasoning over and composing multiple lower-level skills. Hierarchical reinforcement learning aims to enable this, by providing a bank of low-level skills as action abstractions, in the form of primitives or options.
However, an effective hierarchy should exhibit abstraction both in the space of actions and states. We posit that a suitable state abstraction for the higher-level policy should depend on the capabilities of the available lower-level policies, and we propose an approach that produces such a representation by using the value functions corresponding to each lower-level skill to capture the affordances for these skills.
Empirical evaluations for maze-solving and robotic manipulation tasks demonstrate that our approach improves long-horizon performance and enables better zero-shot generalization than popular model-free and model-based methods by constructing a compact state abstraction that represents the affordances of the scene and is robust to distractors.

We implement our approach in two domains: a long-horizon maze solving task, and a complex image-based robotic manipulation simulator. In both settings, we show empirically that, when provided with a suitable bank of skills, our approach enables more effective long-horizon control as compared to alternative state representation learning methods.