Distributional reinforcement learning with linear function approximation

Marc G. Bellemare

Nicolas Le Roux

Pablo Samuel Castro

Subhodeep Moitra

ICML (2019)

Google Scholar

Abstract

Geometry-sensitive metrics between probability distributions play an important role in distributional reinforcement learning \citep{bellemare2017distributional}. In particular, the C51 algorithm can be partially explained in terms of one such metric, the Cram\'er distance (Rowland et al., 2018). The explanation is partial, however, because C51 uses a softmax to guarantee that its output is a proper distribution, and subsequently a cross-entropy loss, neither of which are related to the Cram\'er metric nor even geometry-sensitive. In this paper we extend the work of \citet{rowland2018} for the tabular setting and ask the question: can a fully Cram\'er-based, theoretically-sound algorithm be derived in the presence of function approximation? We replace C51's softmax with a simple linear transfer function and derive an algorithm solely based on the Cram\'er loss. We show that minimizing a variant of the Cram\'er loss implicitly yields proper distributions in the absence of approximation constraints. We derive a new metric tailored to this transfer function, and provide the first proof of convergence of a distributional algorithm combined with function approximation, in the context of policy evaluation. We find a surprising negative result showing that Cram\'er-based methods, including the original C51 algorithm, should perform worse than directly approximating the value function.
As a whole, our results provide new tools for understanding what drives the superior performance of the distributional approach in the approximate setting.

Research Areas

Machine Intelligence

Defining the technology of today and tomorrow.

Philosophy

People

Research areas

Foundational ML & Algorithms

Computing Systems & Quantum AI

Science, AI & Society

Projects

Publications

Resources

Shaping the future, together.

Student programs

Faculty programs

Conferences & events

Distributional reinforcement learning with linear function approximation

Abstract

Research Areas

Meet the teams driving innovation