# Christoph Dann

My research focuses on fundamental questions in reinforcement learning and the design of efficient algorithmic solutions for decision making under uncertainty. For more information, see my website cdann.net.

### Research Areas

Authored Publications

Sort By

Multiple-policy High-confidence Policy Evaluation

Mohammad Ghavamzadeh

International Conference on Artificial Intelligence and Statistics(2023), pp. 9470-9487

Preview abstract
In reinforcement learning applications, we often want to accurately estimate the return of several policies of interest. We study this problem, multiple-policy high-confidence policy evaluation, where the goal is to estimate the return of all given target policies up to a desired accuracy with as few samples as possible. The natural approaches to this problem, i.e., evaluating each policy separately or estimating a model of the MDP, scale with the number of policies to evaluate or the size of the MDP, respectively. We present an alternative approach based on reusing samples from on-policy Monte-Carlo estimators and show that it is more sample-efficient in favorable cases. Specifically, we provide guarantees in terms of a notion of overlap of the set of target policies and shed light on when such an approach is indeed beneficial compared to existing methods.
View details

A Model Selection Approach for Corruption Robust Reinforcement Learning

Chen-Yu Wei

33rd International Conference on Algorithmic Learning Theory (ALT 2022)(2022)

Preview abstract
We develop a model selection approach to tackle reinforcement learning with adversarial corruption in both transition and reward. For finite-horizon tabular MDPs, without prior knowledge on the total amount of corruption, our algorithm achieves a regret bound of O(min{1/∆,√T}+C)where T is the number of episodes, C is the total amount of corruption, and ∆ is the reward gap between the best and the second-best policies. This is the first worst-case optimal bound achieved without knowledge of C, improving previous results of Lykouris et al. (2021); Chen et al. (2021); Wu et al.(2021). For finite-horizon linear MDPs, we develop a computationally efficient algorithm with a regret bound of ̃O(√(1 +C)T), and another computationally inefficient one with O(√T+C),improving the result of Lykouris et al. (2021) and answering an open question by Zhang et al.(2021b). Finally, our model selection framework can be easily applied to other settings including linear bandits, linear contextual bandits, and MDPs with general function approximation, leading to several improved or new results.
View details

Policy Certificates: Towards Accountable Reinforcement Learning

Lihong Li

Wei Wei

Emma Brunskill

International Conference on Machine Learning (ICML)(2019)

Preview abstract
The performance of a reinforcement learning algorithm can vary drastically during learning because of exploration. Existing algorithms provide little information about the quality of their current policy before executing it, and thus have limited use in high-stakes applications, such as healthcare. We address this lack of accountability by proposing that algorithms output policy certificates. These certificates bound the sub-optimality and return of the policy in the next episode, allowing humans to intervene when the certified quality is not satisfactory. We further introduce two new algorithms with certificates and present a new framework for theoretical analysis that guarantees the quality of their policies and certificates. For tabular MDPs, we show that computing certificates can even improve the sample-efficiency of optimism-based exploration. As a result, one of our algorithms achieves regret and PAC bounds that are minimax up to lower-order terms.
View details