Jump to Content
Ashwinkumar Badanidiyuru Varadaraja

Ashwinkumar Badanidiyuru Varadaraja

Authored Publications
Google Publications
Other Publications
Sort By
  • Title
  • Title, descending
  • Year
  • Year, descending
    Preview abstract In the realm of contextual bandit algorithms, the standard framework involves observing a context, selecting an arm, and then observing the reward. This approach, while functional, often falls short when dealing with the complexity of real-world scenarios where additional valuable contexts are revealed after arm-selection. We introduce a new algorithm, pLinUCB, designed to incorporate these post-serving contexts effectively, thereby achieving sublinear regret. Our key technical contribution is a robustified and generalized version of the well-known Elliptical Potential Lemma (EPL), which allows us to handle the noise in the context vectors, a crucial aspect in a practical setting. This generalized EPL is of independent interest as it has potential applications beyond the scope of this work. Through extensive empirical tests on both synthetic and real-world datasets, we demonstrate that our proposed algorithm outperforms the state-of-the-art, thus establishing its practical potential. Our work underscores the importance of post-serving contexts in the contextual bandit setting and lays the groundwork for further research in this field. View details
    Learning to Bid in Contextual First Price Auctions
    Guru Prashanth Guruganesh
    The Proceedings of the ACM Web Conference 2023
    Preview abstract In this paper, we investigate the problem about how to bid in repeated contextual first price auctions. We consider a single bidder (learner) who repeatedly bids in the first price auctions: at each time t, the learner observes a context xt ∈ R d and decides the bid based on historical information and xt. We assume a structured linear model of the maximum bid of all the others mt = α0 · xt + zt, where α0 ∈ R d is unknown to the learner and zt is randomly sampled from a noise distribution F with log-concave density function f. We consider both binary feedback (the learner can only observe whether she wins or not) and full information feedback (the learner can observe mt) at the end of each time t. For binary feedback, when the noise distribution F is known, we propose a bidding algorithm, by using maximum likelihood estimation (MLE) method to achieve at most Oe( p log(d)T ) regret. Moreover, we generalize this algorithm to the setting with binary feedback and the noise distribution is unknown but belongs to a parametrized family of distributions. For the full information feedback with unknown noise distribution, we provide an algorithm that achieves regret at most Oe( √ dT). Our approach combines an estimator for log-concave density functions and then MLE method to learn the noise distribution F and linear weight α0 simultaneously. We also provide a lower bound result such that any bidding policy in a broad class must achieve regret at least Ω(√ T), even when the learner receives the full information feedback and F is known. View details
    Preview abstract We propose a new family of label randomization mechanisms for the task of training regression models under the constraint of label differential privacy (DP). In particular, we leverage the trade-offs between bias and variance to construct better noising mechanisms depending on a privately estimated prior distribution over the labels. We demonstrate that these mechanisms achieve state-of-the-art privacy-accuracy trade-offs on several datasets, highlighting the importance of bias-reducing constraints when training neural networks with label DP. We also provide theoretical results shedding light on the structural properties of the optimal bias-reduced mechanisms. View details
    Incrementality Bidding via Reinforcement Learning under Mixed and Delayed Rewards
    Tianxi Li
    Haifeng Xu
    Thirty-sixth Conference on Neural Information Processing Systems (NeurIPS 2022) (2022)
    Preview abstract Incrementality, which is used to measure the causal effect of showing an ad to a potential customer (e.g. a user in an internet platform) versus not, is a central problem for advertisers in advertising systems. In this paper, we investigate the problem about how the advertiser can decide the bid through learning conversion incrementality, in an online manner. We formulate this problem as an episodic Markov Decision Process (MDP) and propose a novel reinforcement learning (RL) algorithm that can achieve regret at most $O(H^2\sqrt{T})$, which doesn't rely on the number of actions (bids), where $H$ is the number of rounds in each episode and $T$ is the total number of episodes in MDP. In sharp contrast with standard RL framework, conversion incrementality feedback are delayed and mixed. To handle this difficulty we propose a novel pairwise moment-matching algorithm to learn conversion incrementality, which is of independent of interest. View details
    Preview abstract In performance based digital advertising, one of the key technical tools is to predict the expected value of post ad click purchases (a.k.a. conversions). Some of the salient aspects of this problem such as having a non-binary label and advertisers reporting the label in different scales make it a much harder problem than predicting probability of a click. In this paper we ask what is a good way to model the label and extract as much information as possible. A label can affect the model in multiple ways and we consider three such directions and come up with new techniques for each of them. The first is that the label scale can affect how the model capacity is devoted to different advertisers, the second is how labels for outliers can affect over-fitting and the third is if we can use information in the distribution of the label and not just the mean. View details
    Handling many conversions per click in modeling delayed feedback
    Andrew Evdokimov
    Vinodh Krishnan
    Pan Li
    Wynn Vonnegut
    Jayden Wang
    ADKDD 2021 (2021)
    Preview abstract Predicting the expected number of post-click conversions (purchases or other events) is a key task in performance-based digital advertising. In training a conversion optimizer model, one of the most crucial aspect is handling the delayed feedback with respect to conversions, which can happen multiple times with various delay. This task is difficult, as the delay distribution is different for each advertiser, is long-tailed, often does not follow any particular class of parametric distributions, and can change over time. We tackle these challenges using an unbiased estimation model with the following three ideas. The first idea is to split the label as a sum of labels with different delay buckets, each of which trains only on mature label, the second is to use thermometer encoding to increase accuracy and reduce inference cost, and the third is to use auxiliary information to increase the stability of the model and to handle drift in distribution. View details
    Preview abstract Classic mechanism design often assumes that a bidder’s action is restricted to report a type or a signal, possibly untruthfully. In today’s digital economy, bidders are holding increasing amount of private information about the auctioned items. And due to legal or ethical concerns, they would demand to reveal partial but truthful information, as opposed to report untrue signal or misinformation. To accommodate such bidder behaviors in auction design, we propose and study a novel mechanism design setup where each bidder holds two kinds of information: (1) private value type, which can be misreported; (2) private information variable, which the bidder may want to conceal or partially reveal, but importantly, not to misreport. We refer to bidders with such behaviors as strategically reticent bidders. Among others, one direct motivation of our model is the ad auction in which many ad platforms today elicit from each bidder not only their private value per conversion but also their private information about Internet users (e.g., their conversion history) in order to improve the platform’s estimation of all bidders’ conversion rates. We show that in this new setup, it is still possible to design mechanisms that are both Incentive and Information Compatible (IIC). We develop two different black-box transformations, which convert any mechanism M for classic bidders to a mechanism M′ for strategically reticent bidders, based on either outcome of expectation or expectation of outcome, respectively. We identify properties of the original mechanismMunder which the transformation leads to IIC mechanismsM′ . Interestingly, as corollaries of these results, we show that running VCG with expected bidder values maximizes welfare whereas the mechanism using expected outcome of Myerson’s auction maximizes revenue. Finally, we study how regulation on the auctioneer’s usage of information may lead to more robust mechanisms. View details
    Online Learning via Offline Greedy: Applications in Market Design and Optimization
    Rad Niazadeh
    Negin Golrezaei
    Joshua Wang
    Fransisca Susan
    EC 2021, Management Science Journal (2020)
    Preview abstract Motivated by online decision-making in time-varying combinatorial environments, we study the problem of transforming offline algorithms to their online counterparts. We focus on offline combinatorial problems that are amenable to a constant factor approximation using a greedy algorithm that is robust to local errors. For such problems, we provide a general framework that efficiently transforms offline robust greedy algorithms to online ones using Blackwell approachability. We show that the resulting online algorithms have O(T^{1/2}) (approximate) regret under the full information setting. We further introduce a bandit extension of Blackwell approachability that we call Bandit Blackwell approachability. We leverage this notion to transform greedy robust offline algorithms into a O(T^{2/3})(approximate) regret in the bandit setting. Demonstrating the flexibility of our framework, we apply our offline-to-online transformation to several problems at the intersection of revenue management, market design, and online optimization, including product ranking optimization in online platforms, reserve price optimization in auctions, and submodular maximization. We show that our transformation, when applied to these applications, leads to new regret bounds or improves the current known bounds. View details
    Preview abstract In this paper, we introduce a novel technique forconstrained submodular maximization, inspired by barrier functions in continuous optimization.This connection not only improves the running time for constrained submodular maximization but also provides the state of the art guarantee. More precisely, for maximizing a monotone submodular function subject to the combination of a k-matchoid and l-knapsack constraint (for l≤k), we propose a potential function that can be approximately minimized. Once we minimize the potential function up to an ε error it is guaranteed that we have found a feasible set with a 2(k+1+ε)-approximation factor which can indeed be further improved to (k+1+ε)by an enumeration technique. We extensively evaluate the performance of our proposed algorithm over several real-world applications, including a movie recommendation system, summarization tasks for YouTube videos, Twitter feeds and Yelp business locations, and a set cover problem. View details
    Preview abstract Companies like Google and Microsoft run billions of auctions every day to sell advertising opportunities. Any change to the rules of these auctions can have a tremendous effect on the revenue of the company and the welfare of the advertisers and the users. Therefore, any change requires careful evaluation of its potential impacts. Currently, such impacts are often evaluated by running simulations or small controlled experiments. This, however, misses the important factor that the advertisers respond to changes. Our goal is to build a theoretical framework for predicting the actions of an agent (the advertiser) that is optimizing her actions in an uncertain environment. We model this problem using a variant of the multi armed bandit setting where playing an arm is costly. The cost of each arm changes over time and is publicly observable. The value of playing an arm is drawn stochastically from a static distribution and is observed by the agent and not by us. We, however, observe the actions of the agent. Our main result is that assuming the agent is playing a strategy with a regret of at most f(T) within the first T rounds, we can learn to play the multi-armed bandits game without observing the rewards) in such a way that the regret of our selected actions is at most O(k^4 (f(T) + 1) log(T)). View details
    Preview abstract Autobidding is becoming increasingly important in the domain of online advertising, and has become a critical tool used by many advertisers for optimizing their ad campaigns. We formulate fundamental questions around the problem of bidding for performance under very general affine cost constraints. We design optimal single-agent bidding strategies for the general bidding problem, in multi-slot truthful auctions. We show that there is an intimate connection between bidding and auction design, in that the bidding formula is optimal if and only if the underlying auction is truthful. We show how a MWU algorithm can be used to learn this optimal bidding formula. Next, we move from the single-agent view to taking a full-system view: What happens when all advertisers adopt optimal autobidding? We prove that in general settings, there exists an equilibrium between the bidding agents for all the advertisers. Further, we prove a Price of Anarchy result: For the general affine constraints, the total value (conversions) obtained by the advertisers in the bidding agent equilibrium is no less than 1/2 of what we could generate via a centralized ad allocation scheme, one which does not consider any auction incentives or provide any per-advertiser guarantee. View details
    Preview abstract Modern ad auctions allow advertisers to target more specific segments of the user population. Unfortunately, this is not always in the best interest of the ad platform – partially hiding some information could be more beneficial for the platform’s revenue. In this paper, we examine the following basic question in the context of second-price ad auctions: how should an ad platform optimally reveal information about the ad opportunity to the advertisers in order to maximize revenue? We consider a model in which bidders’ valuations depend on a random state of the ad opportunity. Different from previous work, we focus on a more practical, and challenging, situation where the space of possible realizations of ad opportunities is extremely large. We thus focus on developing algorithms whose running time is polynomial in the number of bidders, but is independent of the number of ad opportunity realizations. We assume that the auctioneer can commit to a signaling scheme to reveal noisy information about the realized state of the ad opportunity, and examine the auctioneer’s algorithmic question of designing the optimal signaling scheme. We first consider that the auctioneer is restricted to send a public signal to all bidders. As a warm-up, we start with a basic (though less realistic) setting in which the auctioneer knows the bidders’ valuations, and show that an -optimal scheme can be implemented in time polynomial in the number of bidders and 1/. We then move to a well-motivated Bayesian valuation setting in which the auctioneer and bidders both have private information, and present two results. First, we exhibit a characterization result regarding approximately optimal schemes and prove that any constant-approximate public signaling scheme must use exponentially many signals. Second, we present a “simple” public signaling scheme that serves as a constant approximation under mild assumptions. Finally, we initiate an exploration on the power of being able to send different signals privately to different bidders. In the basic setting where the auctioneer knows bidders’ valuations, we exhibit a polynomial-time private scheme that extracts almost full surplus even in the worst Bayes Nash equilibrium. This illustrates the surprising power of private signaling schemes in extracting revenue. View details
    Preview abstract Can we summarize multi-category data based on user preferences in a scalable manner? Many utility functions used for data summarization satisfy submodularity, a natural diminishing returns property. We cast personalized data summarization as an instance of a general submodular maximization problem subject to multiple constraints. We develop the first practical and FAst coNsTrained submOdular Maximization algorithm, FANTOM, with strong theoretical guarantees. FANTOM maximizes a submodular function (not necessarily monotone) subject to the intersection of a p-system and l knapsacks constrains. It achieves a (1+ϵ)(p+1)(2p+2l+1)/p approximation guarantee with only O( nrp log(n)/\epsilon) query complexity (n and r indicate the size of the ground set and the size of the largest feasible solution, respectively). We then show how we can use FANTOM for personalized data summarization. In particular, a p-system can model different aspects of data, such as categories or time stamps, from which the users choose. In addition, knapsacks encode users’ constraints including budget or time. In our set of experiments, we consider several concrete applications: movie recommendation over 11K movies, personalized image summarization with 10K images, and revenue maximization on the YouTube social networks with 5000 communities. We observe that FANTOM constantly provides the highest utility against all the baselines. View details
    Locally adaptive optimization: adaptive seeding for monotone submodular functions
    Christos Papadimitriou
    Aviad Rubinstein
    Lior Seeman
    Yaron Singer
    SODA (2016)
    Preview abstract The Adaptive Seeding problem is an algorithmic challenge motivated by influence maximization in social networks: One seeks to select among certain accessible nodes in a network, and then select, adaptively, among neighbors of those nodes as they become accessible in order to maximize a global objective function. More generally, adaptive seeding is a stochastic optimization framework where the choices in the first stage affect the realizations in the second stage, over which we aim to optimize. Our main result is a (1 - 1/e)^2-approximation for the adaptive seeding problem for any monotone submodular function. While adaptive policies are often approximated via non-adaptive policies, our algorithm is based on a novel method we call locally-adaptive policies. These policies combine a non-adaptive global structure, with local adaptive optimizations. This method enables the (1 - 1/e)^2-approximation for general monotone submodular functions and circumvents some of the impossibilities associated with non-adaptive policies. We also introduce a fundamental problem in submodular optimization that may be of independent interest: given a ground set of elements where every element appears with some small probability, find a set of expected size at most k that has the highest expected value over the realization of the elements. We show a surprising result: there are classes of monotone submodular functions (including coverage) that can be approximated almost optimally as the probability vanishes. For general monotone submodular functions we show via a reduction from Planted-Clique that approximations for this problem are not likely to be obtainable. This optimization problem is an important tool for adaptive seeding via non-adaptive policies, and its hardness motivates the introduction of locally-adaptive policies we use in the main result. View details
    Lazier Than Lazy Greedy
    Baharan Mirzasoleiman
    Amin Karbasi
    Jan Vondrák
    Andreas Krause
    AAAI (2015)
    Preview
    Preview abstract How can one find a subset, ideally as small as possible, that well represents a massive dataset? I.e., its corresponding utility, measured according to a suitable utility function, should be comparable to that of the whole dataset. In this paper, we formalize this challenge as a submodular cover problem. Here, the utility is assumed to exhibit submodularity, a natural diminishing returns condition prevalent in many data summarization applications. The classical greedy algorithm is known to provide solutions with logarithmic approximation guarantees compared to the optimum solution. However, this sequential, centralized approach is impractical for truly large-scale problems. In this work, we develop the first distributed algorithm – DISCOVER – for submodular set cover that is easily implementable using MapReduce-style computations. We theoretically analyze our approach, and present approximation guarantees for the solutions returned by DISCOVER. We also study a natural trade-off between the communication cost and the number of rounds required to obtain such a solution. In our extensive experiments, we demonstrate the effectiveness of our approach on several applications, including active set selection, exemplar based clustering, and vertex cover on tens of millions of data points using Spark. View details
    Streaming Submodular Maximization: Massive Data Summarization on the Fly
    Baharan Mirzasoleiman
    Amin Karbasi
    Andreas Krause
    KDD 2014
    Robust Multi-objective Learning with Mentor Feedback
    Alekh Agarwal
    Miroslav Dudik
    Robert E. Schapire
    Aleksandrs Slivkins
    COLT 2014
    Fast algorithms for maximizing submodular functions
    Jan Vondrak
    SODA 2014
    Resourceful contextual bandits
    John Langford
    Aleksandrs Slivkins
    COLT 2014
    Bandits with Knapsacks
    Robert Kleinberg
    Aleksandrs Slivkins
    IEEE Symposium on Foundations of Computer Science (FOCS) (2013)
    Learning on a Budget: Posted Price Mechanisms for Online Procurement
    Robert Kleinberg
    Yaron Singer
    EC 2012
    Optimization with Demand Oracles
    Shahar Dobzinski
    Sigal Oren
    EC 2012
    Sketching Valuation Functions
    Shahar Dobzinski
    Hu Fu
    Robert Kleinberg
    Noam Nisan
    Tim Roughgarden
    SODA 2012
    Approximating Low-Dimensional Coverage Problems
    Robert Kleinberg
    Hooyeon Lee
    SOCG 2012
    Buyback Problem - Approximate matroid intersection with cancellation costs
    Randomized Online Algorithms for the Buyback Problem
    Robert Kleinberg
    WINE 2009
    On Tradeoff Between Network Connectivity, Phase Complexity and Communication Complexity of Reliable Communication Tolerating Mixed Adversary
    Arpita Patra
    Ashish Choudhary
    Kannan Srinathan
    C. Pandu Rangan
    PODC 2008