The Factored Policy-Gradient Planner
Abstract
We present an any-time concurrent probabilistic temporal planner (CPTP) that includes
continuous and discrete uncertainties and metric functions. Rather than relying on dy-
namic programming our approach builds on methods from stochastic local policy search.
That is, we optimise a parameterised policy using gradient ascent. The flexibility of this
policy-gradient approach, combined with its low memory use, the use of function approxi-
mation methods and factorisation of the policy, allow us to tackle complex domains. This
Factored Policy Gradient (FPG) planner can optimise steps to goal, the probability of
success, or attempt a combination of both. We compare the FPG planner to other plan-
ners on CPTP domains, and on simpler but better studied non-concurrent non-temporal
probabilistic planning (PP) domains. We present FPG-ipc, the PP version of the planner
which has been successful in the probabilistic track of the fifth international planning
competition.