Efficient Rate Optimal Regret for Adversarial Contextual MDPs Using Online Function Approximation

Orin Levy; Alon Cohen; Asaf Cassel; Yishay Mansour

Efficient Rate Optimal Regret for Adversarial Contextual MDPs Using Online Function Approximation

Orin Levy

Alon Cohen

Asaf Cassel

Yishay Mansour

ICML 2023 (2023)

Download Google Scholar

Abstract

We present the OMG-CMDP! algorithm for regret minimization in adversarial Contextual MDPs. The algorithm operates under the minimal assumptions of realizable function class and access to online least squares and log loss regression oracles. Our algorithm is efficient (assuming efficient online regression oracles), simple and robust to approximation errors. It enjoys an
$\widetilde{O}(H^{2.5} \sqrt{ T|S||A| (
} \linebreak[1]
\overline{ \mathcal{R}(\mathcal{O})
+ H \log(\delta^{-1}) )})$ regret guarantee,
with $T$ being the number of episodes, $S$ the state space, $A$ the action space, $H$ the horizon and $\mathcal{R}(\mathcal{O}) = \mathcal{R}(\mathcal{O}_{\mathrm{sq}}^\mathcal{F}) + \mathcal{R}(\mathcal{O}_{\mathrm{log}}^\mathcal{P})$
is the sum of the regression oracles' regret, used to approximate the context-dependent rewards and dynamics, respectively. To the best of our knowledge, our algorithm is the first efficient and rate optimal regret minimization algorithm for adversarial CMDPs which operates under the minimal and standard assumption of online function approximation. Our technique relies on standard convex optimization algorithms, and we show that it is robust to approximation errors.

Research Areas

Machine intelligence

Explore our many areas of focus

Building a collaborative ecosystem

Shaping the future together

Translating discovery into real-world impact

Efficient Rate Optimal Regret for Adversarial Contextual MDPs Using Online Function Approximation

Abstract

Research Areas

Meet the teams driving innovation

Google AI

Google Cloud

Google DeepMind

Google Labs