Fast Online Policy Gradient Learning with SMD Gain Vector Adaptation

Nicol N. Schraudolph; Jin Yu; Douglas Aberdeen

Fast Online Policy Gradient Learning with SMD Gain Vector Adaptation

Nicol N. Schraudolph

Jin Yu

Douglas Aberdeen

Advances in Neural Information Processing Systems, The {MIT} Press, Cambridge, MA (2006), pp. 1185-1192

Download Google Scholar

Abstract

Reinforcement learning by direct policy gradient estimation is attractive in theory but in practice leads to notoriously ill-behaved optimization problems. We improve its robustness and speed of convergence with stochastic meta-descent, a gain vector adaptation method that employs fast Hessian-vector products. In our experiments the resulting algorithms outperform previously employed online stochastic, offline conjugate, and natural policy gradient methods.

Research Areas

Machine intelligence

Explore our many areas of focus

Building a collaborative ecosystem

Shaping the future together

Translating discovery into real-world impact

Fast Online Policy Gradient Learning with SMD Gain Vector Adaptation

Abstract

Research Areas

Meet the teams driving innovation

Google AI

Google Cloud

Google DeepMind

Google Labs