A Model Selection Approach for Corruption Robust Reinforcement Learning

Chen-Yu Wei; Christoph Dann; Julian Zimmert

A Model Selection Approach for Corruption Robust Reinforcement Learning

Chen-Yu Wei

Christoph Dann

Julian Zimmert

33rd International Conference on Algorithmic Learning Theory (ALT 2022) (2022)

Download Google Scholar

Abstract

We develop a model selection approach to tackle reinforcement learning with adversarial corruption in both transition and reward. For finite-horizon tabular MDPs, without prior knowledge on the total amount of corruption, our algorithm achieves a regret bound of O(min{1/∆,√T}+C)where T is the number of episodes, C is the total amount of corruption, and ∆ is the reward gap between the best and the second-best policies. This is the first worst-case optimal bound achieved without knowledge of C, improving previous results of Lykouris et al. (2021); Chen et al. (2021); Wu et al.(2021). For finite-horizon linear MDPs, we develop a computationally efficient algorithm with a regret bound of ̃O(√(1 +C)T), and another computationally inefficient one with O(√T+C),improving the result of Lykouris et al. (2021) and answering an open question by Zhang et al.(2021b). Finally, our model selection framework can be easily applied to other settings including linear bandits, linear contextual bandits, and MDPs with general function approximation, leading to several improved or new results.

Research Areas

Machine intelligence

Explore our many areas of focus

Building a collaborative ecosystem

Shaping the future together

Translating discovery into real-world impact

A Model Selection Approach for Corruption Robust Reinforcement Learning

Abstract

Research Areas

Meet the teams driving innovation

Google AI

Google Cloud

Google DeepMind

Google Labs