The Factored Policy-Gradient Planner
Abstract
We present an any-time concurrent probabilistic temporal planner (CPTP) that includes
continuous and discrete uncertainties and metric functions. Rather than relying on dy-
namic programming our approach builds on methods from stochastic local policy search.
That is, we optimise a parameterised policy using gradient ascent. The flexibility of this
policy-gradient approach, combined with its low memory use, the use of function approxi-
mation methods and factorisation of the policy, allow us to tackle complex domains. This
Factored Policy Gradient (FPG) planner can optimise steps to goal, the probability of
success, or attempt a combination of both. We compare the FPG planner to other plan-
ners on CPTP domains, and on simpler but better studied non-concurrent non-temporal
probabilistic planning (PP) domains. We present FPG-ipc, the PP version of the planner
which has been successful in the probabilistic track of the fifth international planning
competition.
continuous and discrete uncertainties and metric functions. Rather than relying on dy-
namic programming our approach builds on methods from stochastic local policy search.
That is, we optimise a parameterised policy using gradient ascent. The flexibility of this
policy-gradient approach, combined with its low memory use, the use of function approxi-
mation methods and factorisation of the policy, allow us to tackle complex domains. This
Factored Policy Gradient (FPG) planner can optimise steps to goal, the probability of
success, or attempt a combination of both. We compare the FPG planner to other plan-
ners on CPTP domains, and on simpler but better studied non-concurrent non-temporal
probabilistic planning (PP) domains. We present FPG-ipc, the PP version of the planner
which has been successful in the probabilistic track of the fifth international planning
competition.