Balasubramanian Sivan

Balasubramanian Sivan

Balasubramanian Sivan is a Research Scientist at Google New York. His research interests are in Algorithmic Game Theory, Online + Approximation algorithms and Online Learning. He got his undergraduate degree in Computer Science from Indian Institute of Technology Madras (2008) and PhD in Computer Science (2013) from the University of Wisconsin-Madison advised by Prof. Shuchi Chawla, and joined Google in August 2015 after spending two years at Microsoft Research Redmond as a postdoctoral researcher. His PhD thesis on Prior Robust Optimization received the ACM SIGecom doctoral dissertation award. See his personal webpage http://pages.cs.wisc.edu/~balu2901/ for more details on his publications.
Authored Publications
Sort By
  • Title
  • Title, descending
  • Year
  • Year, descending
    Optimal Pricing Schemes for an Impatient Buyer
    Kangning Wang
    Proceedings of the Annual ACM-SIAM Symposium on Discrete Algorithms(2023), pp. 382-398
    Preview abstract A patient seller aims to sell a good to an impatient buyer (i.e., one who discounts utility over time).The buyer will remain in the market for a period of time T , and her private value is drawn from a publicly known distribution. What is the revenue-optimal pricing-curve (sequence of (price, time) pairs) for the seller? Is randomization of help here? Is the revenue-optimal pricing-curve computable in polynomial time? We answer these questions in this paper. We give an efficient algorithm for computing the revenue-optimal pricing curve. We show that pricing curves, that post a price at each point of time and let the buyer pick her utility maximizing time to buy, are revenue-optimal among a much broader class of sequential lottery mechanisms: namely, mechanisms that allow the seller to post a menu of lotteries at each point of time cannot get any higher revenue than pricing curves. We also show that the even broader class of mechanisms that allow the menu of lotteries to be adaptively set, can earn strictly higher revenue than that of pricing curves, and the revenue gap can be as big as the support size of the buyer’s value distribution. View details
    Preview abstract Blackwell's celebrated theory measures approachability using the $\ell_2$ (Euclidean) distance. In many applications such as regret minimization, it is often more useful to study approachability under other distance metrics, most commonly the $\ell_\infty$ metric. However, the time and space complexity of the algorithms designed for $\ell_\infty$ approachability depend on the dimension of the space of the vectorial payoffs, which is often prohibitively large. We present a framework for converting high-dimensional $\ell_\infty$ approachability problems to low-dimensional \emph{pseudonorm} approachability problems, thereby resolving such issues. We first show that the $\ell_\infty$ distance between the average payoff and the approachability set can be equivalently defined as a \emph{pseudodistance} between a lower-dimensional average vector payoff and a new convex convex set we define. Next, we develop an algorithmic theory of pseudonorm approachability analogous to previous work norm approachability showing that it can be achieved via online linear optimization (OLO) over a convex set given by the Fenchel dual of the unit pseudonorm ball. We then use that to show, modulo mild normalization assumptions, that there exists an $\ell_\infty$ approachability algorithm whose convergence is independent of the dimension of the original vector payoff. We further show that that algorithm admits a polynomial-time complexity, assuming that the original $\ell_\infty$-distance can be computed efficiently. We also give an $\ell_\infty$ approachability algorithm whose convergence is logarithmic in that dimension using an FTRL algorithm with a maximum-entropy regularizer. Finally, we illustrate the benefits of our framework by applying it to several problems in regret minimization. View details
    Preview abstract Motivated by the online advertising industry, we study the non-stationary stochastic budget management problem: An advertiser repeatedly participates in $T$ second-price auctions, where her value and the highest competing bid are drawn from unknown time-varying distributions, with the goal of maximizing her total utility subject to her budget constraint. In the absence of any information about the distributions, it is known that sub-linear regret cannot be achieved. We assume access to historical samples, with the goal of developing algorithms that are robust to discrepancies between the sampling distributions and the true distributions. We show that our Dual Follow-The-Regularized-Leader algorithm is robust and achieves a near-optimal $\tilde O(\sqrt{T})$-regret with just one sample per distribution, drastically improving over the best-known sample-complexity of $T$ samples per distribution. View details
    Preview abstract We study repeated two-player games where one of the players, the learner, employs a no-regret learning strategy, while the other, the optimizer, is a rational utility maximizer. We consider general Bayesian games, where the payoffs of both the optimizer and the learner could depend on the type, which is drawn from a publicly known distribution, but revealed privately to the learner. We address the following questions: (a) what is the bare minimum that the optimizer is guaranteed to obtain regardless of the no-regret learning algorithm employed by the learner? (b) are there learning algorithms that cap the optimizer payoff at this minimum? (c) can these generalizations be implemented efficiently? While building this theory of optimizer-learner interactions, we define a new combinatorial notion of regret called polytope swap regret, that could be of independent interest in other settings. View details
    Approximately Efficient Bilateral Trade
    Kangning Wang
    Proceedings of the 54th Annual ACM SIGACT Symposium on Theory of Computing(2022), 718–721
    Preview abstract We study bilateral trade between two strategic agents. The celebrated result of Myerson and Satterthwaite states that in general, no incentive-compatible, individually rational and weakly budget balanced mechanism can be efficient. I.e., no mechanism with these properties can guarantee a trade whenever buyer value exceeds seller cost. Given this, a natural question is whether there exists a mechanism with these properties that guarantees a constant fraction of the first-best gains-from-trade, namely a constant fraction of the gains-from-trade attainable whenever buyer’s value weakly exceeds seller’s cost. In this work, we positively resolve this long-standing open question on constant-factor approximation, mentioned in several previous works, using a simple mechanism that obtains a 1/8.23 ≈ 0.121 fraction of the first-best. View details
    Preview abstract In the Learning to Price setting, a seller posts prices over time with the goal of maximizing revenue while learning the buyer's valuation. This problem is very well understood when values are stationary (fixed or iid). Here we study the problem where the buyer's value is a moving target, i.e., they change over time either by a stochastic process or adversarially with bounded variation. In either case, we provide matching upper and lower bounds on the optimal revenue loss. Since the target is moving, any information learned soon becomes out-dated, which forces the algorithms to keep switching between exploring and exploiting phases. View details
    Welfare-maximizing Guaranteed Dashboard Mechanisms
    Jason Hartline
    Proceedings of the 22nd ACM Conference on Economics and Computation(2021), pp. 370
    Preview abstract Bidding dashboards are used in online marketplaces to aid a bidder in computing good bidding strategies, particularly when the auction used by the marketplace is constrained to have the winners-pay-bid payment format. A dashboard predicts the outcome a bidder can expect to get at each possible bid. To convince a bidder to best respond to the information published in a dashboard, a dashboard mechanism should ensure either (a) that best responding maximizes the bidder's utility (a weaker requirement) or (b) that the mechanism implements the outcome published in the dashboard (a stronger requirement that subsumes (a)). Recent work by Hartline et al. EC'19 formalized the notion of dashboard mechanisms and designed winners-pay-bid mechanisms that guaranteed epsilon-optimal utility (an epsilon-approximate version of (a)), but not (b). I.e., the mechanism could end up implementing arbitrarily different outcomes from what was promised. While this guarantee is sufficient from a purely technical perspective, it is far from enough in the real world: it is hard to convince bidders to best respond to information which could be arbitrarily inaccurate, regardless of the theoretical promise of near-optimality. In this paper we study guaranteed dashboard mechanisms, namely, ones that are guaranteed to implement what they publish, and obtain good welfare. We study this question in a repeated auction setting for general single-dimensional valuations and give tight characterizations of the loss in welfare as a function of natural parameters upper bounding the difference in valuation profile across the rounds. In particular, we give three different characterizations, bounding the loss in welfare in terms of the 0 norm, 1 norm and infinite norm of difference in valuation profile across rounds. All the characterizations generalize at least up to matroid feasibility constraints, and the infinite norm characterization extends to general downward-closed feasibility constraints. We bring to bear different techniques for each of these characterizations, including connections to differential privacy and online convex optimizations. View details
    Preview abstract We study the fundamental problem of selling a single indivisible good to one ofnbuyers with independentvaluations. We seek to design improved approximations to the optimal revenue achievable through two simpleand widely used mechanisms: second price auction with eager personalized reserve prices, and sequentialposted price mechanisms. Until recently, the best known approximation for both these mechanisms was 1−1e.We give improved approximations of 1−1e+0.022∼0.6543 for the sequential posted price mechanism and1−1e+0.029∼0.662 for the second price auction with eager reserve prices. We also make some progresstowards the problem of computing the optimal personalized eager reserve prices for a second price auction. View details
    Preview abstract We consider a setting in which bidders participate in multiple auctions run by different sellers, and optimize their bids for the \emph{aggregate} auction. We analyze this setting by formulating a game between sellers, where a seller's strategy is to pick an auction to run. Our analysis aims to shed light on the recent change in the Display Ads market landscape: here, ad exchanges (sellers) were mostly running second price auctions earlier and over time they switched to variants of the first price auction, culminating in Google's Ad Exchange moving to a first price auction in 2019. Our model and results offer an explanation for why the first price auction occurs as a natural equilibrium in such competitive markets. View details
    Strategizing against No-regret Learners
    Advances in Neural Information Processing Systems(2019), pp. 1579-1587
    Preview abstract How should a player who repeatedly plays a game against a no-regret learner strategize to maximize his utility? We study this question and show that under some mild assumptions, the player can always guarantee himself a utility of at least what he would get in a Stackelberg equilibrium of the game. When the no-regret learner has only two actions, we show that the player cannot get any higher utility than the Stackelberg equilibrium utility. But when the no-regret learner has more than two actions and plays a mean-based no-regret strategy, we show that the player can get strictly higher than the Stackelberg equilibrium utility. We provide a characterization of the optimal game-play for the player against a mean-based no-regret learner as a solution to a control problem. When the no-regret learner's strategy also guarantees him a no-swap regret, we show that the player cannot get anything higher than a Stackelberg equilibrium utility. View details