The Factored Policy-Gradient Planner

Olivier Buffet
Journal of Artificial Intelligence Research (JAIR), 173(2008), pp. 722-747


We present an any-time concurrent probabilistic temporal planner (CPTP) that includes continuous and discrete uncertainties and metric functions. Rather than relying on dy- namic programming our approach builds on methods from stochastic local policy search. That is, we optimise a parameterised policy using gradient ascent. The flexibility of this policy-gradient approach, combined with its low memory use, the use of function approxi- mation methods and factorisation of the policy, allow us to tackle complex domains. This Factored Policy Gradient (FPG) planner can optimise steps to goal, the probability of success, or attempt a combination of both. We compare the FPG planner to other plan- ners on CPTP domains, and on simpler but better studied non-concurrent non-temporal probabilistic planning (PP) domains. We present FPG-ipc, the PP version of the planner which has been successful in the probabilistic track of the fifth international planning competition.

Research Areas