Jump to Content

Market algorithms

Our mission is to analyze, design, and deliver economically and computationally efficient marketplaces across Google.

About the team

Our research in auction theory, mechanism design, and advanced algorithms serves to improve Ads and other market-based products.

Team focus summaries

Auction optimization for ad exchanges

As part of the display ads eco-system, advertising exchanges provide many challenging optimization and algorithmic mechanism design problems. Examples of such research areas include auction design in the presence of supply chain of auctioneers, optimal competition between reservation, spot markets and reserve price optimization.

Display ads research

Display ads eco-system provides a great platform for a variety of research problems in online stochastic optimization and computational economics. Examples of such areas are robust online allocation problems, and optimal contract design in display advertising.

Dynamic mechanism design via bank accounts

The bulk of online ads are sold via repeated auctions. Instead of optimizing these auctions separately per auction, one can design stateful (dynamic) pricing and allocation strategies that may optimize these auctions together. While dynamic mechanism design has been an active research area, most of the existing mechanisms are either too computationally complex, or rely too much on forecasting of the future auctions. We have designed a new family of dynamic mechanisms, called bank account mechanisms, and showed their effectiveness in designing oblivious dynamic mechanisms that can be implemented without relying on forecasting the future.

Mechanism design with budgets

Budget constraints are a central issue in online advertising. While designing efficient mechanisms with good incentive properties is a well understood question for unbudgeted settings, it is only understood with budgets for very simple settings. In this line of work, we develop efficient mechanisms in settings with budgets for more sophisticated settings that occur in internet advertisement, such as online settings and polyhedral constraints.

Online stochastic matching

All online advertising systems employ online ad selection algorithms satisfying various global constraints and optimizing different objectives. In this regard, we have developed new cutting-edge algorithms for online stochastic matching, budgeted allocation, and more general variants of the problem, called submodular welfare maximization.

Optimizing advertiser campaigns

Advertisers must constantly optimize their campaigns to keep up with changes in their goals, resources and the market itself. To help, Google provides bid automation tools, as well as suggestions for targeting, bid and budget changes. We have studied algorithmic questions in this area to improve these tools and suggestions.

Pricing via online learning

Each ad impression is unique in its combinations of features, which makes it challenging to price them accurately. We develop robust online learning algorithms that can cope with unpredictable supply of ads and that balance the conflicting objectives of learning and earning in online pricing.

Incentive-Aware learning

Many of the applications of machine learning are in environments that are game theoretic in nature, i.e., they involve multiple self-interested agents whose actions affect the data points that the machine learning algorithm observes, and who are affected by the outcome selected by the underlying optimization algorithm. The main challenge of learning in such game theoretic environments is to develop the underlying optimization algorithms that encourage agents to behave truthfully, resulting in a correct outcome.

Featured publications

Preview abstract In online advertising, advertisers purchase ad placements by participating in a long sequence of repeated auctions. One of the most important features advertising platforms often provide, and advertisers often use, is budget management, which allows advertisers to control their cumulative expenditures. Advertisers typically declare the maximum daily amount they are willing to pay, and the platform adjusts allocations and payments to guarantee that cumulative expenditures do not exceed budgets. There are multiple ways to achieve this goal, and each one, when applied to all budget-constrained advertisers simultaneously, steers the system toward a different equilibrium. While previous research focused on online stochastic optimization techniques or game-theoretic equilibria of such settings, our goal in this paper is to compare the ``system equilibria'' of a range of budget management strategies in terms of the seller's profit and buyers' utility. In particular, we consider six different budget management strategies including probabilistic throttling, thresholding, bid shading, reserve pricing, and multiplicative boosting. We show these methods admit a system equilibrium in a rather general setting, and prove dominance relations between them in a simplified setting. Our study sheds light on the impact of budget management strategies on the tradeoff between the seller's profit and buyers' utility. Finally, we also empirically compare the system equilibria of these strategies using real ad auction data in sponsored search and randomly generated bids. The empirical study confirms our theoretical findings about the relative performances of budget management strategies. View details
Non-Clairvoyant Dynamic Mechanism Design
Pingzhong Tang
Proceedings of the 2018 ACM Conference on Economics and Computation, Ithaca, NY, USA, June 18-22, 2018, ACM, pp. 169
Preview abstract Despite their better revenue and welfare guarantees for repeated auctions, dynamic mechanisms have not been widely adopted in practice. This is partly due to the complexity of their implementation as well as their unrealistic use of forecasting for future periods. We address these shortcomings and present a new family of dynamic mechanisms that are simple and require no distributional knowledge of future periods. This paper introduces the concept of non-clairvoyance in dynamic mechanism design, which is a measure-theoretic restriction on the information that the seller is allowed to use. A dynamic mechanism is non-clairvoyant if the allocation and pricing rule at each period does not depend on the type distributions in the future periods. We develop a framework (bank account mechanisms) for characterizing, designing, and proving lower bounds for dynamic mechanisms (clairvoyant or non-clairvoyant). This framework is used to characterize the revenue extraction power of the non-clairvoyant mechanisms with respect to the mechanisms that are allowed unrestricted use of distributional knowledge. View details
A new dog learns old tricks: RL finds classic optimization algorithms
Weiwei Kong
Christopher Liaw
D. Sivakumar
Seventh International Conference on Learning Representations (ICLR) (2019)
Preview abstract We ask whether reinforcement learning can find theoretically optimal algorithms for online optimization problems, and introduce a novel learning framework in this setting. To answer this question, we introduce a number of key ideas from traditional algorithms and complexity theory. Specifically, we introduce the concept of adversarial distributions (universal and high-entropy training sets), which are distributions that encourage the learner to find algorithms that work well in the worst case. We test our new ideas on the AdWords problem, the online knapsack problem, and the secretary problem. Our results indicate that the models have learned behaviours that are consistent with the optimal algorithms for these problems derived using the online primal-dual framework. View details
Preview abstract Autobidding is becoming increasingly important in the domain of online advertising, and has become a critical tool used by many advertisers for optimizing their ad campaigns. We formulate fundamental questions around the problem of bidding for performance under very general affine cost constraints. We design optimal single-agent bidding strategies for the general bidding problem, in multi-slot truthful auctions. We show that there is an intimate connection between bidding and auction design, in that the bidding formula is optimal if and only if the underlying auction is truthful. We show how a MWU algorithm can be used to learn this optimal bidding formula. Next, we move from the single-agent view to taking a full-system view: What happens when all advertisers adopt optimal autobidding? We prove that in general settings, there exists an equilibrium between the bidding agents for all the advertisers. Further, we prove a Price of Anarchy result: For the general affine constraints, the total value (conversions) obtained by the advertisers in the bidding agent equilibrium is no less than 1/2 of what we could generate via a centralized ad allocation scheme, one which does not consider any auction incentives or provide any per-advertiser guarantee. View details
Preview abstract Companies like Google and Microsoft run billions of auctions every day to sell advertising opportunities. Any change to the rules of these auctions can have a tremendous effect on the revenue of the company and the welfare of the advertisers and the users. Therefore, any change requires careful evaluation of its potential impacts. Currently, such impacts are often evaluated by running simulations or small controlled experiments. This, however, misses the important factor that the advertisers respond to changes. Our goal is to build a theoretical framework for predicting the actions of an agent (the advertiser) that is optimizing her actions in an uncertain environment. We model this problem using a variant of the multi armed bandit setting where playing an arm is costly. The cost of each arm changes over time and is publicly observable. The value of playing an arm is drawn stochastically from a static distribution and is observed by the agent and not by us. We, however, observe the actions of the agent. Our main result is that assuming the agent is playing a strategy with a regret of at most f(T) within the first T rounds, we can learn to play the multi-armed bandits game without observing the rewards) in such a way that the regret of our selected actions is at most O(k^4 (f(T) + 1) log(T)). View details
Variance Reduction in Bipartite Experiments through Correlation Clustering
Warren Schudy
Thirty-third Conference on Neural Information Processing Systems (2019) (to appear)
Preview abstract Causal inference in randomized experiments typically assumes that the units of randomization and the units of analysis are one and the same. In some applications, however, these two roles are played by distinct entities linked by a bipartite graph. The key challenge in such bipartite settings is how to avoid interference bias, which would typically arise if we simply randomized the treatment at the level of analysis units. One effective way of minimizing interference bias in standard experiments is through cluster randomization, but this design has not been studied in the bipartite setting where conventional clustering schemes can lead to poorly powered experiments. This paper introduces a novel clustering objective and a corresponding algorithm that partitions a bipartite graph so as to maximize the statistical power of a bipartite experiment on that graph. Whereas previous work relied on balanced partitioning, our formulation suggests the use of a correlation clustering objective. We use a publicly-available graph of Amazon user-item reviews to validate our solution and illustrate how it substantially increases the statistical power in bipartite experiments. View details
Strategizing against No-regret Learners
Advances in Neural Information Processing Systems (2019), pp. 1579-1587
Preview abstract How should a player who repeatedly plays a game against a no-regret learner strategize to maximize his utility? We study this question and show that under some mild assumptions, the player can always guarantee himself a utility of at least what he would get in a Stackelberg equilibrium of the game. When the no-regret learner has only two actions, we show that the player cannot get any higher utility than the Stackelberg equilibrium utility. But when the no-regret learner has more than two actions and plays a mean-based no-regret strategy, we show that the player can get strictly higher than the Stackelberg equilibrium utility. We provide a characterization of the optimal game-play for the player against a mean-based no-regret learner as a solution to a control problem. When the no-regret learner's strategy also guarantees him a no-swap regret, we show that the player cannot get anything higher than a Stackelberg equilibrium utility. View details
Preview abstract We study the dynamic mechanism design problem of a seller who repeatedly sells independent items to a buyer with private values. In this setting, the seller could potentially extract the entire buyer surplus by running efficient auctions and charging an upfront participation fee at the beginning of the horizon. In some markets, such as internet advertising, participation fees are not practical since buyers expect to inspect items before purchasing them. This motivates us to study the design of dynamic mechanisms under successively more stringent requirements that capture the implicit business constraints of these markets. We first consider a periodic individual rationality constraint, which limits the mechanism to charge at most the buyer's value in each period. While this prevents large upfront participation fees, the seller can still design mechanisms that spread a participation fee across the first few auctions. These mechanisms have the unappealing feature that they provide close-to-zero buyer utility in the first auctions in exchange for higher utility in future auctions. To address this problem, we introduce a martingale utility constraint, which imposes the requirement that from the perspective of the buyer, the next item's expected utility is equal to the present one's. Our main result is providing a dynamic auction satisfying martingale utility and periodic individual rationality whose profit loss with respect to first-best (full extraction of buyer surplus) is optimal up to polylogarithmic factors. The proposed mechanism is a dynamic two-tier auction with a hard floor and a soft floor that allocates the item whenever the buyer's bid is above the hard floor and charges the minimum of the bid and the soft floor. View details
Preview abstract In the Submodular Welfare Maximization (SWM) problem, the input consists of a set of n items, each of which must be allocated to one of m agents. Each agent l has a valuation function vl, where vl(S) denotes the welfare obtained by this agent if she receives the set of items S. The functions vl are all submodular; as is standard, we assume that they are monotone and vl(∅) = 0. The goal is to partition the items into m disjoint subsets S1, S2, ... Sm in order to maximize the social welfare, defined as ∑l = 1m vl(Sl). A simple greedy algorithm gives a 1/2-approximation to SWM in the offline setting, and this was the best known until Vondrak's recent (1-1/e)-approximation algorithm [34]. In this paper, we consider the online version of SWM. Here, items arrive one at a time in an online manner; when an item arrives, the algorithm must make an irrevocable decision about which agent to assign it to before seeing any subsequent items. This problem is motivated by applications to Internet advertising, where user ad impressions must be allocated to advertisers whose value is a submodular function of the set of users / impressions they receive. There are two natural models that differ in the order in which items arrive. In the fully adversarial setting, an adversary can construct an arbitrary / worst-case instance, as well as pick the order in which items arrive in order to minimize the algorithm's performance. In this setting, the 1/2-competitive greedy algorithm is the best possible. To improve on this, one must weaken the adversary slightly: In the random order model, the adversary can construct a worst-case set of items and valuations, but does not control the order in which the items arrive; instead, they are assumed to arrive in a random order. The random order model has been well studied for online SWM and various special cases, but the best known competitive ratio (even for several special cases) is 1/2 + 1/n [9,10], barely better than the ratio for the adversarial order. Obtaining a competitive ratio of 1/2 + Ω(1) for the random order model has been an important open problem for several years. We solve this open problem by demonstrating that the greedy algorithm has a competitive ratio of at least 0.505 for online SWM in the random order model. This is the first result showing a competitive ratio bounded above 1/2 in the random order model, even for special cases such as the weighted matching or budgeted allocation problems (without the so-called 'large capacity' assumptions). For special cases of submodular functions including weighted matching, weighted coverage functions and a broader class of "second-order supermodular" functions, we provide a different analysis that gives a competitive ratio of 0.51. We analyze the greedy algorithm using a factor-revealing linear program, bounding how the assignment of one item can decrease potential welfare from assigning future items. We also formulate a natural conjecture which, if true, would improve the competitive ratio of the greedy algorithm to at least 0.567. In addition to our new competitive ratios for online SWM, we make two further contributions: First, we define the classes of second-order modular, supermodular, and submodular functions, which are likely to be of independent interest in submodular optimization. Second, we obtain an improved competitive ratio via a technique we refer to as gain linearizing, which may be useful in other contexts (see [26]): Essentially, we linearize the submodular function by dividing the gain of an optimal solution into gain from individual elements, compare the gain when it assigns an element to the optimal solution's gain from the element, and, crucially, bound the extent to which assigning elements can affect the potential gain of other elements. View details
Preview abstract Motivated by Internet advertising applications, online allocation problems have been studied extensively in various adversarial and stochastic models. While the adversarial arrival models are too pessimistic, many of the stochastic (such as i.i.d or random-order) arrival models do not realistically capture uncertainty in predictions. A significant cause for such uncertainty is the presence of unpredictable traffic spikes, often due to breaking news or similar events. To address this issue, a simultaneous approximation framework has been proposed to develop algorithms that work well both in the adversarial and stochastic models; however, this framework does not enable algorithms that make good use of partially accurate forecasts when making online decisions. In this paper, we propose a robust online stochastic model that captures the nature of traffic spikes in online advertising. In our model, in addition to the stochastic input for which we have good forecasting, an unknown number of impressions arrive that are adversarially chosen.We design algorithms that combine an stochastic algorithm with an online algorithm that adaptively reacts to inaccurate predictions. We provide provable bounds for our new algorithms in this framework. We accompany our positive results with a set of hardness results showing that that our algorithms are not far from optimal in this framework. As a byproduct of our results, we also present improved online algorithms for a slight variant of the simultaneous approximation framework. View details