Jump to Content
Balasubramanian Sivan

Balasubramanian Sivan

Balasubramanian Sivan is a Research Scientist at Google New York. His research interests are in Algorithmic Game Theory, Online + Approximation algorithms and Online Learning. He got his undergraduate degree in Computer Science from Indian Institute of Technology Madras (2008) and PhD in Computer Science (2013) from the University of Wisconsin-Madison advised by Prof. Shuchi Chawla, and joined Google in August 2015 after spending two years at Microsoft Research Redmond as a postdoctoral researcher. His PhD thesis on Prior Robust Optimization received the ACM SIGecom doctoral dissertation award. See his personal webpage http://pages.cs.wisc.edu/~balu2901/ for more details on his publications.
Authored Publications
Google Publications
Other Publications
Sort By
  • Title
  • Title, descending
  • Year
  • Year, descending
    Preview abstract Blackwell's celebrated theory measures approachability using the $\ell_2$ (Euclidean) distance. In many applications such as regret minimization, it is often more useful to study approachability under other distance metrics, most commonly the $\ell_\infty$ metric. However, the time and space complexity of the algorithms designed for $\ell_\infty$ approachability depend on the dimension of the space of the vectorial payoffs, which is often prohibitively large. We present a framework for converting high-dimensional $\ell_\infty$ approachability problems to low-dimensional \emph{pseudonorm} approachability problems, thereby resolving such issues. We first show that the $\ell_\infty$ distance between the average payoff and the approachability set can be equivalently defined as a \emph{pseudodistance} between a lower-dimensional average vector payoff and a new convex convex set we define. Next, we develop an algorithmic theory of pseudonorm approachability analogous to previous work norm approachability showing that it can be achieved via online linear optimization (OLO) over a convex set given by the Fenchel dual of the unit pseudonorm ball. We then use that to show, modulo mild normalization assumptions, that there exists an $\ell_\infty$ approachability algorithm whose convergence is independent of the dimension of the original vector payoff. We further show that that algorithm admits a polynomial-time complexity, assuming that the original $\ell_\infty$-distance can be computed efficiently. We also give an $\ell_\infty$ approachability algorithm whose convergence is logarithmic in that dimension using an FTRL algorithm with a maximum-entropy regularizer. Finally, we illustrate the benefits of our framework by applying it to several problems in regret minimization. View details
    Preview abstract We study repeated two-player games where one of the players, the learner, employs a no-regret learning strategy, while the other, the optimizer, is a rational utility maximizer. We consider general Bayesian games, where the payoffs of both the optimizer and the learner could depend on the type, which is drawn from a publicly known distribution, but revealed privately to the learner. We address the following questions: (a) what is the bare minimum that the optimizer is guaranteed to obtain regardless of the no-regret learning algorithm employed by the learner? (b) are there learning algorithms that cap the optimizer payoff at this minimum? (c) can these generalizations be implemented efficiently? While building this theory of optimizer-learner interactions, we define a new combinatorial notion of regret called polytope swap regret, that could be of independent interest in other settings. View details
    Preview abstract In the Learning to Price setting, a seller posts prices over time with the goal of maximizing revenue while learning the buyer's valuation. This problem is very well understood when values are stationary (fixed or iid). Here we study the problem where the buyer's value is a moving target, i.e., they change over time either by a stochastic process or adversarially with bounded variation. In either case, we provide matching upper and lower bounds on the optimal revenue loss. Since the target is moving, any information learned soon becomes out-dated, which forces the algorithms to keep switching between exploring and exploiting phases. View details
    Welfare-maximizing Guaranteed Dashboard Mechanisms
    Jason Hartline
    Jieming Mao
    Proceedings of the 22nd ACM Conference on Economics and Computation (2021), pp. 370
    Preview abstract Bidding dashboards are used in online marketplaces to aid a bidder in computing good bidding strategies, particularly when the auction used by the marketplace is constrained to have the winners-pay-bid payment format. A dashboard predicts the outcome a bidder can expect to get at each possible bid. To convince a bidder to best respond to the information published in a dashboard, a dashboard mechanism should ensure either (a) that best responding maximizes the bidder's utility (a weaker requirement) or (b) that the mechanism implements the outcome published in the dashboard (a stronger requirement that subsumes (a)). Recent work by Hartline et al. EC'19 formalized the notion of dashboard mechanisms and designed winners-pay-bid mechanisms that guaranteed epsilon-optimal utility (an epsilon-approximate version of (a)), but not (b). I.e., the mechanism could end up implementing arbitrarily different outcomes from what was promised. While this guarantee is sufficient from a purely technical perspective, it is far from enough in the real world: it is hard to convince bidders to best respond to information which could be arbitrarily inaccurate, regardless of the theoretical promise of near-optimality. In this paper we study guaranteed dashboard mechanisms, namely, ones that are guaranteed to implement what they publish, and obtain good welfare. We study this question in a repeated auction setting for general single-dimensional valuations and give tight characterizations of the loss in welfare as a function of natural parameters upper bounding the difference in valuation profile across the rounds. In particular, we give three different characterizations, bounding the loss in welfare in terms of the 0 norm, 1 norm and infinite norm of difference in valuation profile across rounds. All the characterizations generalize at least up to matroid feasibility constraints, and the infinite norm characterization extends to general downward-closed feasibility constraints. We bring to bear different techniques for each of these characterizations, including connections to differential privacy and online convex optimizations. View details
    Preview abstract We consider a setting in which bidders participate in multiple auctions run by different sellers, and optimize their bids for the \emph{aggregate} auction. We analyze this setting by formulating a game between sellers, where a seller's strategy is to pick an auction to run. Our analysis aims to shed light on the recent change in the Display Ads market landscape: here, ad exchanges (sellers) were mostly running second price auctions earlier and over time they switched to variants of the first price auction, culminating in Google's Ad Exchange moving to a first price auction in 2019. Our model and results offer an explanation for why the first price auction occurs as a natural equilibrium in such competitive markets. View details
    Preview abstract We study the fundamental problem of selling a single indivisible good to one ofnbuyers with independentvaluations. We seek to design improved approximations to the optimal revenue achievable through two simpleand widely used mechanisms: second price auction with eager personalized reserve prices, and sequentialposted price mechanisms. Until recently, the best known approximation for both these mechanisms was 1−1e.We give improved approximations of 1−1e+0.022∼0.6543 for the sequential posted price mechanism and1−1e+0.029∼0.662 for the second price auction with eager reserve prices. We also make some progresstowards the problem of computing the optimal personalized eager reserve prices for a second price auction. View details
    Prior-Free Dynamic Auctions with Low Regret Buyers
    Advances in Neural Information Processing Systems (2019), pp. 4803-4813
    Preview abstract We study the problem of how to repeatedly sell to a buyer running a no-regret, mean-based algorithm. Previous work (Braverman et al., EC '18) shows that it is possible to design effective mechanisms in such a setting that extract almost all of the economic surplus, but these mechanisms require the buyer's values each round to be drawn iid from a fixed distribution. In this paper, we do away with this assumption and consider the {\it prior-free setting} where the buyer's value each round is chosen adversarially (possibly adaptively). We show that even in this prior-free setting, it is possible to extract a $(1-\varepsilon)$-approximation of the full economic surplus for any $\varepsilon > 0$. The menu complexity of our mechanism (the number of options offered to a buyer in any round) scales independently of the number of rounds $T$ and polynomially in $\varepsilon$. We show that this is optimal up to a polynomial factor; any mechanism achieving this approximation factor, even when values are drawn stochastically, requires menu complexity at least $\Omega(1/\varepsilon)$. Finally, we examine what is possible when we constrain our mechanism to a natural auction format where overbidding is dominated. Braverman et al. show that even when values are drawn from a known stochastic distribution supported on $[1/H, 1]$, it is impossible in general to extract more than $O(\log\log H / \log H)$ of the economic surplus. We show how to achieve the same approximation factor in the {\it prior-independent} setting (where the distribution is unknown to the seller), and an approximation factor of $O(1 / \log H)$ in the prior-free setting. View details
    Strategizing against No-regret Learners
    Advances in Neural Information Processing Systems (2019), pp. 1579-1587
    Preview abstract How should a player who repeatedly plays a game against a no-regret learner strategize to maximize his utility? We study this question and show that under some mild assumptions, the player can always guarantee himself a utility of at least what he would get in a Stackelberg equilibrium of the game. When the no-regret learner has only two actions, we show that the player cannot get any higher utility than the Stackelberg equilibrium utility. But when the no-regret learner has more than two actions and plays a mean-based no-regret strategy, we show that the player can get strictly higher than the Stackelberg equilibrium utility. We provide a characterization of the optimal game-play for the player against a mean-based no-regret learner as a solution to a control problem. When the no-regret learner's strategy also guarantees him a no-swap regret, we show that the player cannot get anything higher than a Stackelberg equilibrium utility. View details
    Testing Incentive Compatibility in Display Ad Auctions
    Sebastien Lahaie
    Andres Munoz Medina
    Proceedings of the 2018 World Wide Web Conference on World Wide Web, WWW 2018
    Preview abstract Consider a buyer participating in a repeated auction, such as those prevalent in display advertising. How would she test whether the auction is incentive compatible? To bid effectively, she is interested in whether the auction is single-shot incentive compatible---a pure second-price auction, with fixed reserve price---and also dynamically incentive compatible---her bids are not used to set future reserve prices. In this work we develop tests based on simple bid perturbations that a buyer can use to answer these questions, with a focus on dynamic incentive compatibility. There are many potential A/B testing setups that one could use, but we find that many natural experimental designs are, in fact, flawed. For instance, we show that additive perturbations can lead to paradoxical results, where higher bids lead to lower optimal reserve prices. We precisely characterize this phenomenon and show that reserve prices are only guaranteed to be monotone for distributions satisfying the Monotone Hazard Rate (MHR) property. The experimenter must also decide how to split traffic to apply systematic perturbations. It is tempting to have this split be randomized, but we demonstrate empirically that unless the perturbations are aligned with the partitions used by the seller to compute reserve prices, the results are guaranteed to be inconclusive. We validate our results with experiments on real display auction data and show that a buyer can quantify both single-shot and dynamic incentive compatibility even under realistic conditions where only the cost of the impression is observed (as opposed to the exact reserve price). We analyze the cost of running such experiments, exposing trade-offs between test accuracy, cost, and underlying market dynamics. View details
    Robust Repeated Auctions Under Heterogeneous Buyer Behavior
    Shipra Agrawal
    Constantinos Daskalakis
    Proceedings of the Nineteenth ACM Conference on Economics and Computation, EC '18 (2018)
    Preview abstract We study revenue optimization in a repeated auction between a single seller and a single buyer. Traditionally, the design of repeated auctions requires strong modeling assumptions about the bidder behavior, such as it being myopic, infinite lookahead, or some specific form of learning behavior. Is it possible to design mechanisms which are simultaneously optimal against a multitude of possible buyer behaviors? We answer this question by designing a simple state-based mechanism that is simultaneously approximately optimal against a k-lookahead buyer for all k, a buyer who is a no-regret learner, and a buyer who is a policy-regret learner. Against each type of buyer our mechanism attains a constant fraction of the optimal revenue attainable against that type of buyer. We complement our positive results with almost tight impossibility results, showing that the revenue approximation tradeoffs achieved by our mechanism for different lookahead attitudes are near-optimal. View details
    Truthful Multi-Parameter Auctions with Online Supply: An Impossible Combination
    Nikhil R. Devanur
    Vasilis Syrgkanis
    Proceedings of the Twenty-Ninth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2018
    Preview abstract We study a basic auction design problem with online supply. There are two unit-demand bidders and two types of items. The first item type will arrive first for sure, and the second item type may or may not arrive. The auctioneer has to decide the allocation of an item immediately after each item arrives, but is allowed to compute payments after knowing how many items arrived. For this problem we show that there is no deterministic truthful and individually rational mechanism that, even with unbounded computational resources, gets any finite approximation factor to the optimal social welfare. View details
    Tight Lower Bounds for Multiplicative Weights Algorithmic Families
    Nick Gravin
    Yuval Peres
    44th International Colloquium on Automata, Languages, and Programming, ICALP 2017
    Preview abstract We study the fundamental problem of prediction with expert advice and develop regret lower bounds for a large family of algorithms for this problem. We develop simple adversarial primitives, that lend themselves to various combinations leading to sharp lower bounds for many algorithmic families. We use these primitives to show that the classic Multiplicative Weights Algorithm (MWA) has a regret of $\sqrt{\frac{T \ln k}{2}}$ (where T is the time horizon and k is the number of experts), there by completely closing the gap between upper and lower bounds. We further show a regret lower bound of $\frac{2}{3}\sqrt{\frac{T\ln k}{2}}$ for a much more general family of algorithms than MWA, where the learning rate can be arbitrarily varied over time, or even picked from arbitrary distributions over time. We also use our primitives to construct adversaries in the geometric horizon setting for MWA to precisely characterize the regret at $\frac{0.391}{\sqrt{\delta}}$ for the case of 2 experts and a lower bound of $\frac{1}{2}\sqrt{\frac{\ln k}{2\delta}}$ for the case of arbitrary number of experts k (here $\delta$ is the probability that the game ends in any given round). View details
    Stability of Service Under Time-of-use Pricing
    Shuchi Chawla
    Nikhil R. Devanur
    Alexander E. Holroyd
    Anna R. Karlin
    James B. Martin
    Proceedings of the 49th Annual ACM SIGACT Symposium on Theory of Computing, STOC 2017
    Preview abstract We consider time-of-use pricing as a technique for matching supply and demand of temporal resources with the goal of maximizing social welfare. Relevant examples include energy, computing resources on a cloud computing platform, and charging stations for electric vehicles, among many others. A client/job in this setting has a window of time during which he needs service, and a particular value for obtaining it. We assume a stochastic model for demand, where each job materializes with some probability via an independent Bernoulli trial. Given a per-time-unit pricing of resources, any realized job will first try to get served by the cheapest available resource in its window and, failing that, will try to find service at the next cheapest available resource, and so on. Thus, the natural stochastic fluctuations in demand have the potential to lead to cascading overload events. Our main result shows that setting prices so as to optimally handle the expected demand works well: with high probability, when the actual demand is instantiated, the system is stable and the expected value of the jobs served is very close to that of the optimal offline algorithm. View details
    Simple Pricing Schemes for Consumers with Evolving Values
    Shuchi Chawla
    Nikhil R. Devanur
    Anna R. Karlin
    Proceedings of the Twenty-Seventh Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2016
    Preview abstract We consider a pricing problem where a buyer is interested in purchasing/using a good, such as an app or music or software, repeatedly over time. The consumer discovers his value for the good only as he uses it, and the value evolves with each use. Optimizing for the seller's revenue in such dynamic settings is a complex problem and requires assumptions about how the buyer behaves before learning his future value(s), and in particular, how he reacts to risk. We explore the performance of a class of pricing mechanisms that are extremely simple for both the buyer and the seller to use: the buyer reacts to prices myopically without worrying about how his value evolves in the future; the seller needs to optimize for revenue over a space of only two parameters, and can do so without knowing the buyer's risk profile or fine details of the value evolution process. We present simple-versus-optimal type results, namely that under certain assumptions, simple pricing mechanisms of the above form are approximately optimal regardless of the buyer's risk profile. Our results assume that the buyer's value per usage evolves as a martingale. For our main result, we consider pricing mechanisms in which the seller offers the product for free for a certain number of uses, and then charges an appropriate fixed price per usage. We assume that the buyer responds by buying the product for as long as his value exceeds the fixed price. Importantly, the buyer does not need to know anything about how his future value will evolve, only how much he wants to use the product right now. Regardless of the buyers' initial value, our pricing captures as revenue a constant fraction of the total value that the buyers accumulate in expectation over time. View details
    Multi-Score Position Auctions
    Denis Charles
    Nikhil R. Devanur
    Proceedings of the Ninth ACM International Conference on Web Search and Data Mining, WSDM 2016
    Preview abstract In this paper we propose a general family of position auctions used in paid search, which we call multi-score position auctions. These auctions contain the GSP auction and the GSP auction with squashing as special cases. We show experimentally that these auctions contain special cases that perform better than the GSP auction with squashing, in terms of revenue, and the number of clicks on ads. In particular, we study in detail the special case that squashes the first slot alone and show that this beats pure squashing (which squashes all slots uniformly). We study the equilibria that arise in this special case to examine both the first order and the second order effect of moving from the squashing-all-slots auction to the squash-only-the-top-slot auction. For studying the second order effect, we simulate auctions using the value-relevance correlated distribution suggested in Lahaie and Pennock [2007]. Since this distribution is derived from a study of value and relevance distributions in Yahoo! we believe the insights derived from this simulation to be valuable. For measuring the first order effect, in addition to the said simulation, we also conduct experiments using auction data from Bing over several weeks that includes a random sample of all auctions. View details
    Towards Optimal Algorithms for Prediction with Expert Advice
    Nick Gravin
    Yuval Peres
    Proceedings of the Twenty-Seventh Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2016
    Preview abstract We study the classical problem of prediction with expert advice in the adversarial setting with a geometric stopping time. In 1965, Cover'65 gave the optimal algorithm for the case of 2 experts. In this paper, we design the optimal algorithm, adversary and regret for the case of 3 experts. Further, we show that the optimal algorithm for 2 and 3 experts is a probability matching algorithm (analogous to Thompson sampling) against a particular randomized adversary. Remarkably, our proof shows that the probability matching algorithm is not only optimal against this particular randomized adversary, but also minimax optimal. Our analysis develops upper and lower bounds simultaneously, analogous to the primal-dual method. Our analysis of the optimal adversary goes through delicate asymptotics of the random walk of a particle between multiple walls. We use the connection we develop to random walks to derive an improved algorithm and regret bound for the case of 4 experts, and, provide a general framework for designing the optimal algorithm and adversary for an arbitrary number of experts. View details
    Revenue Maximization with Nonexcludable Goods
    Nima Haghpanah
    Internet and Network Economics - 9th International Workshop, WINE 2013, Springer
    Multi-parameter Mechanism Design and Sequential Posted Pricing
    Shuchi Chawla
    Jason D. Hartline
    David Malec
    Proceedings of the 42nd ACM Symposium on Theory of Computing, STOC 2010., ACM (2010), pp. 311-320