Fast Online Policy Gradient Learning with {SMD} Gain Vector Adaptation

Nicol N. Schraudolph
Jin Yu
Advances in Neural Information Processing Systems, The {MIT} Press, Cambridge, MA(2006), pp. 1185-1192

Abstract

Reinforcement learning by direct policy gradient estimation is attractive in theory but in practice leads to notoriously ill-behaved optimization problems. We improve its robustness and speed of convergence with stochastic meta-descent, a gain vector adaptation method that employs fast Hessian-vector products. In our experiments the resulting algorithms outperform previously employed online stochastic, offline conjugate, and natural policy gradient methods.

Research Areas