I propose, analyze, and apply algorithms that learn incrementally, run in real time, and converge to near optimal solutions as the number of observations increases. Most of my recent work focuses on designing multi-armed bandit algorithms for structured real-world problems, which involve ranked lists, matrices, and graphs.

Online learning to rank is a challenging problem because the number of ranked lists is exponential in the length of the list. For instance, the number of lists of length 10 over 1000 web pages is approximately 1000

^{10}. The problem of learning the most relevant list can be solved sample efficiently by leveraging the structure of the lists, such as that any list with relevant items is relevant. The main challenge with estimating the relevance of individual items is that the click patterns of users tend to be subject to strong item and position biases. In particular, the recommended item is often not clicked because the item is not examined, due to the position or item bias; and not because it is not relevant. I design online learning to rank algorithms for stochastic click models that address these challenges.

Another interesting real-world problem is finding the maximum entry of a stochastic low-rank matrix. Consider the following motivating problem. A marketer of a product can take a pair of actions from two sets of actions, K population segments and L marketing channels. Given a product, some segments are easier to market to and some channels are more appropriate. Now suppose that the conversion occurs only if both actions are successful and that each action affects the conversion independently. Then the pair of the population segment and marketing channel that maximizes the conversion rate is the maximum entry of a rank-1 matrix. I design adaptive algorithms that can learn the maximum entry of a stochastic low-rank matrix sample efficiently, without sensing all of its entries.