Google Research

Branislav Kveton


I am a machine learning scientist at Google Research in Mountain View. I was at Adobe Research from 2014 to 2018, at Technicolor's Research Center from 2011 to 2014, and at Intel Research from 2006 to 2011. Before 2006, I was a graduate student in the Intelligent Systems Program at the University of Pittsburgh. My advisor was Milos Hauskrecht. My e-mail is I propose, analyze, and apply algorithms that learn incrementally, run in real time, and converge to near optimal solutions as the number of observations increases. Most of my recent work focuses on designing multi-armed bandit algorithms for structured real-world problems.

Exploration-exploitation trade-off is a fundamental trade-off in any learning problem, between taking exploration actions that lead to learning a better model and taking exploitation actions that have the highest reward under the latest model estimate. This trade-off is often modeled as a multi-armed bandit. A multi-armed bandit is an online learning problem where actions of the learning agent are arms. In practice, the arms can be treatments in a clinical trial or ads on a website. After the arm is pulled, the agent receives its reward. The goal of the agent is to maximize its cumulative reward. The agent does not know the rewards of the arms in advance and faces the so-called exploration-exploitation dilemma: explore, and learn more about the arms; or exploit, and pull the arm with the highest estimated reward thus far.

I made several fundamental contributions to the field of multi-armed bandits. My earlier work focused on structured bandit problems with graphs, submodularity, semi-bandit feedback, and low-rank matrices. This culminated in my work on online learning to rank, where we design bandit algorithms that handle both combinatorial action sets and partial feedback. These algorithms are simple, theoretically sound, robust, and remain the state of the art. My recent work focuses on making bandit algorithms practical. This involves follow-the-perturbed-leader exploration, which can be analyzed up to generalized linear bandits and applied to neural networks; latent bandits, which can be combined with offline graphical models; and even learning of bandit algorithms from logged data.

Learn more about how we do research

We maintain a portfolio of research projects, providing individuals and teams the freedom to emphasize specific types of work