Reinforcement Learning with History Dependent Dynamic Contexts

Guy Tennenholtz; Nadav Merlis; Lior Shani; Martin Mladenov; Craig Boutilier

Reinforcement Learning with History Dependent Dynamic Contexts

Guy Tennenholtz

Nadav Merlis

Lior Shani

Martin Mladenov

Craig Boutilier

Proceedings of the 40th International Conference on Machine Learning (ICML 2023), Honolulu, Hawaii

Download Google Scholar

Abstract

We introduce a framework for modeling and solving reinforcement learning problems in non-Markovian, history-dependent environments. Our framework, called the Dynamic Contextual Markov Decision Process (DCMDP), generalizes the contextual MDP framework to handle non-Markov environments where contexts change over time. To overcome the exponential dependence on history, we leverage an aggregated mapping of previous visits to states, actions and contexts to construct an optimistic upper confidence-based algorithm, for which we establish regret bounds. Motivated by our theoretical results, we introduce a practical model-based algorithm that addresses history-dependent contexts, by planing in a latent space and using optimism over history-dependent features. We demonstrate the efficiency and performance of our approach on a recommendation task using the MovieLens dataset, in which the user's behavior is influenced by the agent's recommendations and changes over time.

Explore our many areas of focus

Building a collaborative ecosystem

Shaping the future together

Translating discovery into real-world impact

Reinforcement Learning with History Dependent Dynamic Contexts

Abstract

Research Areas

Meet the teams driving innovation

Google AI

Google Cloud

Google DeepMind

Google Labs