Particle Value Function
Abstract
The policy gradients of the expected return objective can react slowly to rare rewards.
Yet, in some cases agents may wish to emphasize the low or high returns
regardless of their probability. Borrowing from the economics and control literature,
we review the risk-sensitive value function that arises from an exponential
utility and illustrate its effects on an example. This risk-sensitive value function
is not always applicable to reinforcement learning problems, so we introduce
the particle value function defined by a particle filter over the distributions of an
agent’s experience, which bounds the risk-sensitive one. We illustrate the benefit
of the policy gradients of this objective in Cliffworld.