# Ashwinkumar Badanidiyuru Varadaraja

Authored Publications

Google Publications

Other Publications

Sort By

Follow-ups Matter: Improving Contextual Bandits via Post-serving Features

Chaoqi Wang

Ziyu Ye

Haifeng Xu

NeurIPS-23 (Spotlight) (2023)

Preview abstract
In the realm of contextual bandit algorithms, the standard framework involves observing a context, selecting an arm, and then observing the reward. This approach, while functional, often falls short when dealing with the complexity of real-world scenarios where additional valuable contexts are revealed after arm-selection. We introduce a new algorithm, pLinUCB, designed to incorporate these post-serving contexts effectively, thereby achieving sublinear regret. Our key technical contribution is a robustified and generalized version of the well-known Elliptical Potential Lemma (EPL), which allows us to handle the noise in the context vectors, a crucial aspect in a practical setting. This generalized EPL is of independent interest as it has potential applications beyond the scope of this work. Through extensive empirical tests on both synthetic and real-world datasets, we demonstrate that our proposed algorithm outperforms the state-of-the-art, thus establishing its practical potential. Our work underscores the importance of post-serving contexts in the contextual bandit setting and lays the groundwork for further research in this field.
View details

Leveraging Bias-Variance Trade-offs for Regression with Label Differential Privacy

Avinash Varadarajan

Chiyuan Zhang

Ethan Leeman

Pritish Kamath

NeurIPS 2023 (2023)

Preview abstract
We propose a new family of label randomization mechanisms for the task of training regression models under the constraint of label differential privacy (DP). In particular, we leverage the trade-offs between bias and variance to construct better noising mechanisms depending on a privately estimated prior distribution over the labels. We demonstrate that these mechanisms achieve state-of-the-art privacy-accuracy trade-offs on several datasets, highlighting the importance of bias-reducing constraints when training neural networks with label DP. We also provide theoretical results shedding light on the structural properties of the optimal bias-reduced mechanisms.
View details

Learning to Bid in Contextual First Price Auctions

Guru Prashanth Guruganesh

The Proceedings of the ACM Web Conference 2023

Preview abstract
In this paper, we investigate the problem about how to bid in repeated contextual first price
auctions. We consider a single bidder (learner) who repeatedly bids in the first price auctions:
at each time t, the learner observes a context xt ∈ R
d and decides the bid based on historical
information and xt. We assume a structured linear model of the maximum bid of all the others
mt = α0 · xt + zt, where α0 ∈ R
d
is unknown to the learner and zt is randomly sampled from a
noise distribution F with log-concave density function f. We consider both binary feedback (the
learner can only observe whether she wins or not) and full information feedback (the learner can
observe mt) at the end of each time t. For binary feedback, when the noise distribution F is
known, we propose a bidding algorithm, by using maximum likelihood estimation (MLE) method
to achieve at most Oe(
p
log(d)T ) regret. Moreover, we generalize this algorithm to the setting
with binary feedback and the noise distribution is unknown but belongs to a parametrized family
of distributions. For the full information feedback with unknown noise distribution, we provide
an algorithm that achieves regret at most Oe(
√
dT). Our approach combines an estimator for
log-concave density functions and then MLE method to learn the noise distribution F and linear
weight α0 simultaneously. We also provide a lower bound result such that any bidding policy
in a broad class must achieve regret at least Ω(√
T), even when the learner receives the full
information feedback and F is known.
View details

Incrementality Bidding via Reinforcement Learning under Mixed and Delayed Rewards

Tianxi Li

Haifeng Xu

Thirty-sixth Conference on Neural Information Processing Systems (NeurIPS 2022) (2022)

Preview abstract
Incrementality, which is used to measure the causal effect of showing an ad to a potential customer (e.g. a user in an internet platform) versus not, is a central problem for advertisers in advertising systems. In this paper, we investigate the problem about how the advertiser can decide the bid through learning conversion incrementality, in an online manner. We formulate this problem as an episodic Markov Decision Process (MDP) and propose a novel reinforcement learning (RL) algorithm that can achieve regret at most $O(H^2\sqrt{T})$, which doesn't rely on the number of actions (bids), where $H$ is the number of rounds in each episode and $T$ is the total number of episodes in MDP. In sharp contrast with standard RL framework, conversion incrementality feedback are delayed and mixed. To handle this difficulty we propose a novel pairwise moment-matching algorithm to learn conversion incrementality, which is of independent of interest.
View details

Handling many conversions per click in modeling delayed feedback

Andrew Evdokimov

Vinodh Krishnan

Pan Li

Wynn Vonnegut

Jayden Wang

ADKDD 2021 (2021)

Preview abstract
Predicting the expected number of post-click conversions (purchases or other events) is a key task in performance-based digital advertising. In training a conversion optimizer model, one of the most crucial aspect is handling the delayed feedback with respect to conversions, which can happen multiple times with various delay. This task is difficult, as the delay distribution is different for each advertiser, is long-tailed, often does not follow any particular class of parametric distributions, and can change over time. We tackle these challenges using an unbiased estimation model with the following three ideas. The first idea is to split the label as a sum of labels with different delay buckets, each of which trains only on mature label, the second is to use thermometer encoding to increase accuracy and reduce inference cost, and the third is to use auxiliary information to increase the stability of the model and to handle drift in distribution.
View details

Online Learning via Offline Greedy: Applications in Market Design and Optimization

Rad Niazadeh

Negin Golrezaei

Joshua Wang

Fransisca Susan

EC 2021, Management Science Journal (2020)

Preview abstract
Motivated by online decision-making in time-varying combinatorial environments, we study the problem of transforming offline algorithms to their online counterparts. We focus on offline combinatorial problems that are amenable to a constant factor approximation using a greedy algorithm that is robust to local errors. For such problems, we provide a general framework that efficiently transforms offline robust greedy algorithms to online ones using Blackwell approachability. We show that the resulting online algorithms have O(T^{1/2}) (approximate) regret under the full information setting. We further introduce a bandit extension of Blackwell approachability that we call Bandit Blackwell approachability. We leverage this notion to transform greedy robust offline algorithms into a O(T^{2/3})(approximate) regret in the bandit setting. Demonstrating the flexibility of our framework, we apply our offline-to-online transformation to several problems at the intersection of revenue management, market design, and online optimization, including product ranking optimization in online platforms, reserve price optimization in auctions, and submodular maximization. We show that our transformation, when applied to these applications, leads to new regret bounds or improves the current known bounds.
View details

Response Prediction for Low-Regret Agents

Sadra Yazdanbod

Web and Internet Economics 2019

Preview abstract
Companies like Google and Microsoft run billions of auctions every day to sell advertising opportunities. Any change to the rules of these auctions can have a tremendous effect on the revenue of the company and the welfare of the advertisers and the users. Therefore, any change requires careful evaluation of its potential impacts. Currently, such impacts are often evaluated by running simulations or small controlled experiments. This, however, misses the important factor that the advertisers respond to changes. Our goal is to build a theoretical framework for predicting the actions of an agent (the advertiser) that is optimizing her actions in an uncertain environment. We model this problem using a variant of the multi armed bandit setting where playing an arm is costly. The cost of each arm changes over time and is publicly observable. The value of playing an arm is drawn stochastically from a static distribution and is observed by the agent and not by us. We, however, observe the actions of the agent. Our main result is that assuming the agent is playing a strategy with a regret of at most f(T) within the first T rounds, we can learn to play the multi-armed bandits game without observing the rewards) in such a way that the regret of our selected actions is at most O(k^4 (f(T) + 1) log(T)).
View details

Locally adaptive optimization: adaptive seeding for monotone submodular functions

Christos Papadimitriou

Aviad Rubinstein

Lior Seeman

Yaron Singer

SODA (2016)

Preview abstract
The Adaptive Seeding problem is an algorithmic challenge motivated by influence maximization in social networks: One seeks to select among certain accessible nodes in a network, and then select, adaptively, among neighbors of those nodes as they become accessible in order to maximize a global objective function. More generally, adaptive seeding is a stochastic optimization framework where the choices in the first stage affect the realizations in the second stage, over which we aim to optimize.
Our main result is a (1 - 1/e)^2-approximation for the adaptive seeding problem for any monotone submodular function. While adaptive policies are often approximated via non-adaptive policies, our algorithm is based on a novel method we call locally-adaptive policies. These policies combine a non-adaptive global structure, with local adaptive optimizations. This method enables the (1 - 1/e)^2-approximation for general monotone submodular functions and circumvents some of the impossibilities associated with non-adaptive policies.
We also introduce a fundamental problem in submodular optimization that may be of independent interest: given a ground set of elements where every element appears with some small probability, find a set of expected size at most k that has the highest expected value over the realization of the elements. We show a surprising result: there are classes of monotone submodular functions (including coverage) that can be approximated almost optimally as the probability vanishes. For general monotone submodular functions we show via a reduction from Planted-Clique that approximations for this problem are not likely to be obtainable. This optimization problem is an important tool for adaptive seeding via non-adaptive policies, and its hardness motivates the introduction of locally-adaptive policies we use in the main result.
View details

Fast algorithms for maximizing submodular functions

Resourceful contextual bandits

Streaming Submodular Maximization: Massive Data Summarization on the Fly

Robust Multi-objective Learning with Mentor Feedback

Alekh Agarwal

Miroslav Dudik

Robert E. Schapire

Aleksandrs Slivkins

COLT 2014

Bandits with Knapsacks

Robert Kleinberg

Aleksandrs Slivkins

IEEE Symposium on Foundations of Computer Science (FOCS) (2013)

Optimization with Demand Oracles

Approximating Low-Dimensional Coverage Problems

Sketching Valuation Functions

Shahar Dobzinski

Hu Fu

Robert Kleinberg

Noam Nisan

Tim Roughgarden

SODA 2012

Learning on a Budget: Posted Price Mechanisms for Online Procurement

Buyback Problem - Approximate matroid intersection with cancellation costs

ICALP 2011

Randomized Online Algorithms for the Buyback Problem