A Comparative Analysis of Expected and Distributional Reinforcement Learning
Abstract
Since their introduction a year ago, distributional approaches to reinforcement learning (distributional RL) have produced strong results relative to the standard, expectation-based, approach (expected RL). However, aside from theoretical convergence guarantees, there have been few theoretical results investigating the reasons behind the improvements distributional RL provides. In this paper we begin the investigation into this fundamental question by analyzing the differences in the tabular, linear approximation, and non-linear approximation settings. We prove theoretically that in the tabular and linear approximation settings, distributional RL does not provide an advantage over expected RL, and can in fact hurt performance. We then continue with an empirical analysis comparing distributional and expected RL methods in control settings with non-linear approximators to tease apart where the improvements from distributional RL methods are coming from.