Vahab S. Mirrokni

Vahab S. Mirrokni

Vahab Mirrokni is a Google Fellow and VP at Google Research, leading algorithm and optimization research groups at Google. These research teams include: market algorithms, large-scale graph mining, and large-scale optimization. Previously he was a distinguished scientist and senior research director at Google. He received his PhD from MIT in 2005 and his B.Sc. from Sharif University of Technology in 2001. He joined Google Research in 2008, after research positions at Microsoft Research, MIT and Amazon.com. He is the co-winner of best paper awards at KDD, ACM EC, and SODA. His research areas include algorithms, distributed and stochastic optimization, and computational economics. Recently he has been working on various algorithmic problems in machine learning, online optimization and mechanism design, and large-scale graph-based learning . His full publication list by year can be found here . He also has an out-dated personal homepage.
Authored Publications
Sort By
  • Title
  • Title, descending
  • Year
  • Year, descending
    Preview abstract In recent years, the growing adoption of autobidding has motivated the study of auction design with value-maximizing auto-bidders. It is known that under mild assumptions, uniform bid-scaling is an optimal bidding strategy in truthful auctions, e.g., Vickrey-Clarke-Groves auction (VCG), and the price of anarchy for VCG is 2. However, for other auction formats like First-Price Auction (FPA) and Generalized Second-Price auction (GSP), uniform bid-scaling may not be an optimal bidding strategy, and bidders have incentives to deviate to adopt strategies with non-uniform bid-scaling. Moreover, FPA can achieve optimal welfare if restricted to uniform bid-scaling, while its price of anarchy becomes 2 when non-uniform bid-scaling strategies are allowed. All these price of anarchy results have been focused on welfare approximation in the worst-case scenarios. To complement theoretical understandings, we empirically study how different auction formats (FPA, GSP, VCG) with different levels of non-uniform bid-scaling perform in an autobidding world with a synthetic dataset for auctions. Our empirical findings include: * For both uniform bid-scaling and non-uniform bid-scaling, FPA is better than GSP and GSP is better than VCG in terms of both welfare and profit; * A higher level of non-uniform bid-scaling leads to lower welfare performance in both FPA and GSP, while different levels of non-uniform bid-scaling have no effect in VCG. Our methodology of synthetic data generation may be of independent interest. View details
    Preview abstract In this survey, we summarize recent developments in research fueled by the growing adoption of automated bidding strategies in online advertising. We explore the challenges and opportunities that have arisen as markets embrace this autobidding and cover a range of topics in this area, including bidding algorithms, equilibrium analysis and efficiency of common auction formats, and optimal auction design. View details
    Preview abstract Motivated by the increased adoption of autobidding algorithms in internet advertising markets, we study the design of optimal mechanisms for selling items to a value-maximizing buyer with a return-on-spend constraint. The buyer's values and target ratio in the return-on-spend constraint are private. We restrict attention to deterministic sequential screening mechanisms that can be implemented as a menu of prices paid for purchasing an item or not. The main result of this paper is to provide a characterization of an optimal mechanism. Surprisingly, we show that the optimal mechanism does not require target screening, i.e., offering a single pair of prices is optimal for the seller. The optimal mechanism is a subsidized posted price that provides a subsidy to the buyer to encourage participation and then charges a fixed unit price for each item sold. The seller's problem is a challenging non-linear mechanism design problem, and a key technical contribution of our work is to provide a novel approach to analyze non-linear pricing contracts. View details
    Preview abstract We study the price of anarchy of the generalized second-price auction where bidders are value maximizers (i.e., autobidders). We show that in general the price of anarchy can be as bad as 0. For comparison, the price of anarchy of running VCG is 1/2 in the autobidding world. We further show a fined-grained price of anarchy with respect to the discount factors (i.e., the ratios of click probabilities between lower slots and the highest slot in each auction) in the generalized second-price auction, which highlights the qualitative relation between the smoothness of the discount factors and the efficiency of the generalized second-price auction. View details
    Preview abstract Neural embedding models have become a fundamental component of modern information retrieval (IR) pipelines. These models produce a single embedding x ∈ R^d per data-point, allowing for fast retrieval via highly optimized maximum inner product search (MIPS) algorithms. Recently, beginning with the landmark ColBERT paper, multi-vector models, which produce a set of embedding per data point, have achieved markedly superior performance for IR tasks. Unfortunately, using these models for IR is computationally expensive due to the increased complexity of multi-vector retrieval and scoring. In this paper, we introduce MUVERA (Multi-Vector Retrieval Algorithm), a retrieval mechanism which reduces multi-vector similarity search to single-vector similarity search. This enables the usage of off-the-shelf MIPS solvers for multi-vector retrieval. MUVERA asymmetrically generates Fixed Dimensional Encodings (FDEs) of queries and documents, which are vectors whose inner product approximates multi-vector similarity. We prove that FDEs give high-quality ε-approximations, thus providing the first single-vector proxy for multi-vector similarity with theoretical guarantees. Empirically, we find that FDEs achieve the same recall as prior state-of-the-art heuristics while retrieving 2-5× fewer candidates. Compared to prior state of the art implementations, MUVERA achieves consistently good end-to-end recall and latency across a diverse set of the BEIR retrieval datasets, achieving an average of 10% improved recall with 90% lower latency. View details
    Mechanism Design for Large Language Models
    Paul Duetting
    Haifeng Xu
    Proceedings of the ACM on Web Conference 2024, Association for Computing Machinery, New York, NY, USA, 144–155
    Preview abstract We investigate auction mechanisms for AI-generated content, focusing on applications like ad creative generation. In our model, agents' preferences over stochastically generated content are encoded as large language models (LLMs). We propose an auction format that operates on a token-by-token basis, and allows LLM agents to influence content creation through single dimensional bids. We formulate two desirable incentive properties and prove their equivalence to a monotonicity condition on output aggregation. This equivalence enables a second-price rule design, even absent explicit agent valuation functions. Our design is supported by demonstrations on a publicly available LLM. View details
    HyperAttention: Large-scale Attention in Linear Time
    Amin Karbasi
    Amir Zandieh
    Insu Han
    David Woodruff
    HyperAttention: Long-context Attention in Near-Linear Time (2024) (to appear)
    Preview abstract In this paper, we introduce a novel approximate attention mechanism dubbed ``HyperAttention``. Despite the rapidly increasing size and complexity of contexts used with Large Language Models (LLM), there is still a dire lack of computationally efficient attention mechanisms scaling better than the naive quadratic time. HyperAttention addresses this gap: it delivers provably linear time complexity with respect to the size of the context, while only incurring a negligible loss in downstream quality. Distinctively, it integrates the principles of Locality Sensitive Hashing (LSH), for efficient detection of heavy elements, along with uniform column sampling, allowing for a good approximation both of the heavy and light components of the attention matrix. HyperAttention provably approximates the attention layer in \textit{linear time}, making it the first practical linear time approximate attention mechanism. Crucially, HyperAttention has a highly-modular design, allowing seamless integration of other rapid low-level implementations, most notably FlashAttention. Empirical evaluations indicate that HyperAttention surpasses the existing methods, achieving orders of magnitude speed-up when compared to prevalent state-of-the-art solutions such as Flash Attention. This breakthrough presents significant implications for enabling the scalability of LLMs to significantly larger contexts. View details
    PriorBoost: An Adaptive Algorithm for Learning from Aggregate Responses
    Adel Javanmard
    Proceedings of the 41st International Conference on Machine Learning (2024), pp. 21410-21429
    Preview abstract This work studies algorithms for learning from aggregate responses. We focus on the construction of aggregation sets (called \emph{bags} in the literature) for event-level loss functions. We prove for linear regression and generalized linear models (GLMs) that the optimal bagging problem reduces to one-dimensional size-constrained $k$-means clustering. Further, we theoretically quantify the advantage of using curated bags over random bags. We propose the \texttt{PriorBoost} algorithm, which iteratively forms increasingly homogenous bags with respect to (unseen) individual responses to improve model quality. We also explore label differential privacy for aggregate learning, and provide extensive experiments that demonstrate that \PriorBoost regularly achieves optimal quality, in contrast to non-adaptive algorithms for aggregate learning. View details
    Individual Welfare Guarantees in the Autobidding World with Machine-learned Advice
    Negin Golrezaei
    Patrick Jaillet
    Jason Cheuk Nam Liang
    Proceedings of the ACM on Web Conference 2024, 267–275
    Preview abstract Online advertising channels commonly focus on maximizing total advertiser welfare to enhance channel health, and previous literature has studied augmenting ad auctions with machine learning predictions on advertiser values (also known asmachine-learned advice ) to improve total welfare. Yet, such improvements could come at the cost of individual bidders' welfare and do not shed light on how particular advertiser bidding strategies impact welfare. Motivated by this, we present an analysis on an individual bidder's welfare loss in the autobidding world for auctions with and without machine-learned advice, and also uncover how advertiser strategies relate to such losses. In particular, we demonstrate how ad platforms can utilize ML advice to improve welfare guarantee on the aggregate and individual bidder level by setting ML advice as personalized reserve prices when the platform consists ofautobidders who maximize value while respecting a return on ad spend (ROAS) constraint. Under parallel VCG auctions with such ML advice-based reserves, we present a worst-case welfare lower-bound guarantee for an individual autobidder, and show that the lower-bound guarantee is positively correlated with ML advice quality as well as the scale of bids induced by the autobidder's bidding strategies. Further, we show that no truthful, and possibly randomized mechanism with anonymous allocations can achieve universally better individual welfare guarantees than VCG, in the presence of personalized reserves based on ML-advice of equal quality. Moreover, we extend our individual welfare guarantee results to generalized first price (GFP) and generalized second price (GSP) auctions. Finally, we present numerical studies using semi-synthetic data derived from ad auction logs of a search ad platform to showcase improvements in individual welfare when setting personalized reserve prices with ML-advice. View details
    Preview abstract We introduceTeraHAC, a (1+epsilon)-approximate hierarchical agglomerative clustering (HAC) algorithm whichs cales to trillion-edge graphs. Our algorithm is based on a new approach to computing (1+epsilon)-approximate HAC, which is a novel combination of the nearest-neighbor chain algorithm and the notion of (1+epsilon)-approximate HAC. Our approach allows us to partition the graph among multiple machines and make significant progress in computing the clustering within each partition before any communication with other partitions is needed.We evaluate TeraHAC on a number of real-world and synthetic graphs of up to 8 trillion edges. We show that TeraHAC requires over 100x fewer rounds compared to previously known approaches for computing HAC. It is up to 8.3x faster than SCC, the state-of-the-art distributed algorithm for hierarchical clustering, while achieving 1.16x higher quality. In fact, TeraHAC essentially retains the quality of the celebrated HAC algorithm while significantly improving the running time. View details