Variance Reduction for Evolution Strategies via Structured Control Variates

Yunhao Tang; Krzysztof Choromanski; Alp Kucukelbir

Variance Reduction for Evolution Strategies via Structured Control Variates

Yunhao Tang

Krzysztof Choromanski

Alp Kucukelbir

The 23nd International Conference on Artificial Intelligence and Statistics (AISTATS 2020) (to appear)

Download Google Scholar

Abstract

Evolution Strategies (ES) are a powerful class of blackbox optimization techniques that recently became a competitive alternative to state-of-the-art policy gradient (PG) algorithms for reinforcement learning (RL). We propose a new method for improving accuracy of the ES algorithms, that as opposed to recent approaches utilizing only Monte Carlo structure of the gradient estimator, takes advantage of the underlying Markov Decision Process (MDP) structure to reduce the variance.
We observe that the gradient estimator of the ES objective can be alternatively computed using reparametrization and PG estimators, which leads to new control variate techniques for gradient estimation in ES optimization. We provide theoretical insights and show through extensive experiments that this RL-specific variance reduction approach outperforms general purpose variance reduction methods.

Explore our many areas of focus

Building a collaborative ecosystem

Shaping the future together

Translating discovery into real-world impact

Variance Reduction for Evolution Strategies via Structured Control Variates

Abstract

Research Areas

Meet the teams driving innovation

Google AI

Google Cloud

Google DeepMind

Google Labs