Teodor Vanislavov Marinov
My main research interests are in the field of Theoretical Machine Learning. Recently my research has focused on Reinforcement Learning with applications to compiler optimization and how to make Large Language Models (LLMs) more factual. On the more theoretical side I am interested in Bandit Problems, more efficient algorithms for Reinforcement Learning beyond worst case settings and understanding emergent abilities of LLMs.
Research Areas
Authored Publications
Sort By
Multiple-policy High-confidence Policy Evaluation
Mohammad Ghavamzadeh
International Conference on Artificial Intelligence and Statistics (2023), pp. 9470-9487
Preview abstract
In reinforcement learning applications, we often want to accurately estimate the return of several policies of interest. We study this problem, multiple-policy high-confidence policy evaluation, where the goal is to estimate the return of all given target policies up to a desired accuracy with as few samples as possible. The natural approaches to this problem, i.e., evaluating each policy separately or estimating a model of the MDP, scale with the number of policies to evaluate or the size of the MDP, respectively. We present an alternative approach based on reusing samples from on-policy Monte-Carlo estimators and show that it is more sample-efficient in favorable cases. Specifically, we provide guarantees in terms of a notion of overlap of the set of target policies and shed light on when such an approach is indeed beneficial compared to existing methods.
View details