Google Research

Distributional reinforcement learning with linear function approximation

Abstract

Geometry-sensitive metrics between probability distributions play an important role in distributional reinforcement learning \citep{bellemare2017distributional}. In particular, the C51 algorithm can be partially explained in terms of one such metric, the Cram\'er distance (Rowland et al., 2018). The explanation is partial, however, because C51 uses a softmax to guarantee that its output is a proper distribution, and subsequently a cross-entropy loss, neither of which are related to the Cram\'er metric nor even geometry-sensitive. In this paper we extend the work of \citet{rowland2018} for the tabular setting and ask the question: can a fully Cram\'er-based, theoretically-sound algorithm be derived in the presence of function approximation? We replace C51's softmax with a simple linear transfer function and derive an algorithm solely based on the Cram\'er loss. We show that minimizing a variant of the Cram\'er loss implicitly yields proper distributions in the absence of approximation constraints. We derive a new metric tailored to this transfer function, and provide the first proof of convergence of a distributional algorithm combined with function approximation, in the context of policy evaluation. We find a surprising negative result showing that Cram\'er-based methods, including the original C51 algorithm, should perform worse than directly approximating the value function. As a whole, our results provide new tools for understanding what drives the superior performance of the distributional approach in the approximate setting.

Research Areas

Learn more about how we do research

We maintain a portfolio of research projects, providing individuals and teams the freedom to emphasize specific types of work