Non-Stationary Off-policy Optimization

Joey Hong; Branislav Kveton; Manzil Zaheer; Yinlam Chow; Amr Mahmoud El Houssieny Ahmed

Non-Stationary Off-policy Optimization

Joey Hong

Branislav Kveton

Manzil Zaheer

Yinlam Chow

Amr Mahmoud El Houssieny Ahmed

International Conference on Artificial Intelligence and Statistics (AISTATS) (2021)

Download Google Scholar

Abstract

Off-policy learning is a framework for estimating the value of and optimizing policies offline from logged data without deploying them. Real-world environments are nonstationary, and the optimized policies should be able to adapt to these changes. To address this challenge, we study the novel problem of off-policy optimization in piecewise-stationary environments. Our key idea is to use a change-point detector to partition the logged data into categorical latent states, then find a near-optimal policy conditioned on latent state. We derive high-probability bounds on our off-policy estimates and optimization. Furthermore, we also propose a practical approach to deploy our policy online and evaluate our approach comprehensively on a real-world clickstream dataset.

Research Areas

Machine intelligence

Explore our many areas of focus

Building a collaborative ecosystem

Shaping the future together

Translating discovery into real-world impact

Non-Stationary Off-policy Optimization

Abstract

Research Areas

Meet the teams driving innovation

Google AI

Google Cloud

Google DeepMind

Google Labs