I am a machine learning scientist at Google Research in Mountain View. I was at Adobe Research from 2014 to 2018, at Technicolor's Research Center from 2011 to 2014, and at Intel Research from 2006 to 2011. Before 2006, I was a graduate student in the Intelligent Systems Program at the University of Pittsburgh. My advisor was Milos Hauskrecht. My e-mail is email@example.com. I propose, analyze, and apply algorithms that learn incrementally, run in real time, and converge to near optimal solutions as the number of observations increases. Most of my recent work focuses on designing multi-armed bandit algorithms for structured real-world problems.
Exploration-exploitation trade-off is a fundamental trade-off in any learning problem,
between taking exploration actions that lead to learning a better model and taking
exploitation actions that have the highest reward under the latest model estimate. This
trade-off is often modeled as a multi-armed bandit. A multi-armed bandit is an online
learning problem where actions of the learning agent are arms. In practice, the arms
can be treatments in a clinical trial or ads on a website. After the arm is pulled, the
agent receives its reward. The goal of the agent is to maximize its cumulative reward.
The agent does not know the rewards of the arms in advance and faces the so-called
exploration-exploitation dilemma: explore, and learn more about the arms; or exploit,
and pull the arm with the highest estimated reward thus far.
I made several fundamental contributions to the field of multi-armed bandits. My
earlier work focused on structured bandit problems with graphs, submodularity,
semi-bandit feedback, and low-rank matrices. This culminated in my work on online
learning to rank, where we design bandit algorithms that handle both combinatorial
action sets and partial feedback. These algorithms are simple, theoretically sound,
robust, and remain the state of the art. My recent work focuses on making bandit
algorithms practical. This involves follow-the-perturbed-leader exploration, which can
be analyzed up to generalized linear bandits and applied to neural networks; latent
bandits, which can be combined with offline graphical models; and even learning of
bandit algorithms from logged data.