Advantage Amplification in Slowly Evolving Latent-State Environments

Martin Mladenov; Ofer Meshi; Jayden Ooi; Dale Schuurmans; Craig Boutilier

Advantage Amplification in Slowly Evolving Latent-State Environments

Martin Mladenov

Ofer Meshi

Jayden Ooi

Dale Schuurmans

Craig Boutilier

Proceedings of the Twenty-eighth International Joint Conference on Artificial Intelligence (IJCAI-19), Macau, China (2019), pp. 3165-3172

Download Google Scholar

Abstract

Latent-state environments with long horizons, such as those faced by recommender systems, pose significant challenges for reinforcement learning (RL). In this work, we identify and analyze several key hurdles for RL in such environments, including belief state error and small action advantage. We develop a general principle called advantage amplification that can overcome these hurdles through the use of temporal abstraction. We propose several aggregation methods and prove they induce amplification in certain settings. We also bound the loss in optimality incurred by our methods in environments where latent state evolves slowly and demonstrate their performance empirically in a stylized user-modeling task.

Research Areas

Machine intelligence

Explore our many areas of focus

Building a collaborative ecosystem

Shaping the future together

Translating discovery into real-world impact

Advantage Amplification in Slowly Evolving Latent-State Environments

Abstract

Research Areas

Meet the teams driving innovation

Google AI

Google Cloud

Google DeepMind

Google Labs