Vahab S. Mirrokni

Vahab S. Mirrokni

Vahab Mirrokni is a Google Fellow and VP at Google Research, leading algorithm and optimization research groups at Google. These research teams include: market algorithms, large-scale graph mining, and large-scale optimization. Previously he was a distinguished scientist and senior research director at Google. He received his PhD from MIT in 2005 and his B.Sc. from Sharif University of Technology in 2001. He joined Google Research in 2008, after research positions at Microsoft Research, MIT and Amazon.com. He is the co-winner of best paper awards at KDD, ACM EC, and SODA. His research areas include algorithms, distributed and stochastic optimization, and computational economics. Recently he has been working on various algorithmic problems in machine learning, online optimization and mechanism design, and large-scale graph-based learning . His full publication list by year can be found here . He also has an out-dated personal homepage.
Authored Publications
Sort By
  • Title
  • Title, descending
  • Year
  • Year, descending
    Mechanism Design for Large Language Models
    Song Zuo
    Paul Duetting
    Haifeng Xu
    Proceedings of the ACM on Web Conference 2024, Association for Computing Machinery, New York, NY, USA, 144–155
    Preview abstract We investigate auction mechanisms for AI-generated content, focusing on applications like ad creative generation. In our model, agents' preferences over stochastically generated content are encoded as large language models (LLMs). We propose an auction format that operates on a token-by-token basis, and allows LLM agents to influence content creation through single dimensional bids. We formulate two desirable incentive properties and prove their equivalence to a monotonicity condition on output aggregation. This equivalence enables a second-price rule design, even absent explicit agent valuation functions. Our design is supported by demonstrations on a publicly available LLM. View details
    HyperAttention: Large-scale Attention in Linear Time
    Amin Karbasi
    Amir Zandieh
    Insu Han
    David Woodruff
    HyperAttention: Long-context Attention in Near-Linear Time(2024) (to appear)
    Preview abstract In this paper, we introduce a novel approximate attention mechanism dubbed ``HyperAttention``. Despite the rapidly increasing size and complexity of contexts used with Large Language Models (LLM), there is still a dire lack of computationally efficient attention mechanisms scaling better than the naive quadratic time. HyperAttention addresses this gap: it delivers provably linear time complexity with respect to the size of the context, while only incurring a negligible loss in downstream quality. Distinctively, it integrates the principles of Locality Sensitive Hashing (LSH), for efficient detection of heavy elements, along with uniform column sampling, allowing for a good approximation both of the heavy and light components of the attention matrix. HyperAttention provably approximates the attention layer in \textit{linear time}, making it the first practical linear time approximate attention mechanism. Crucially, HyperAttention has a highly-modular design, allowing seamless integration of other rapid low-level implementations, most notably FlashAttention. Empirical evaluations indicate that HyperAttention surpasses the existing methods, achieving orders of magnitude speed-up when compared to prevalent state-of-the-art solutions such as Flash Attention. This breakthrough presents significant implications for enabling the scalability of LLMs to significantly larger contexts. View details
    Individual Welfare Guarantees in the Autobidding World with Machine-learned Advice
    Negin Golrezaei
    Patrick Jaillet
    Jason Cheuk Nam Liang
    Proceedings of the ACM on Web Conference 2024, 267–275
    Preview abstract Online advertising channels commonly focus on maximizing total advertiser welfare to enhance channel health, and previous literature has studied augmenting ad auctions with machine learning predictions on advertiser values (also known asmachine-learned advice ) to improve total welfare. Yet, such improvements could come at the cost of individual bidders' welfare and do not shed light on how particular advertiser bidding strategies impact welfare. Motivated by this, we present an analysis on an individual bidder's welfare loss in the autobidding world for auctions with and without machine-learned advice, and also uncover how advertiser strategies relate to such losses. In particular, we demonstrate how ad platforms can utilize ML advice to improve welfare guarantee on the aggregate and individual bidder level by setting ML advice as personalized reserve prices when the platform consists ofautobidders who maximize value while respecting a return on ad spend (ROAS) constraint. Under parallel VCG auctions with such ML advice-based reserves, we present a worst-case welfare lower-bound guarantee for an individual autobidder, and show that the lower-bound guarantee is positively correlated with ML advice quality as well as the scale of bids induced by the autobidder's bidding strategies. Further, we show that no truthful, and possibly randomized mechanism with anonymous allocations can achieve universally better individual welfare guarantees than VCG, in the presence of personalized reserves based on ML-advice of equal quality. Moreover, we extend our individual welfare guarantee results to generalized first price (GFP) and generalized second price (GSP) auctions. Finally, we present numerical studies using semi-synthetic data derived from ad auction logs of a search ad platform to showcase improvements in individual welfare when setting personalized reserve prices with ML-advice. View details
    Non-uniform Bid-scaling and Equilibria for Different Auctions: An Empirical Study
    Jieming Mao
    Yifeng Teng
    Song Zuo
    Proceedings of the ACM on Web Conference 2024, 256–266
    Preview abstract In recent years, the growing adoption of autobidding has motivated the study of auction design with value-maximizing auto-bidders. It is known that under mild assumptions, uniform bid-scaling is an optimal bidding strategy in truthful auctions, e.g., Vickrey-Clarke-Groves auction (VCG), and the price of anarchy for VCG is 2. However, for other auction formats like First-Price Auction (FPA) and Generalized Second-Price auction (GSP), uniform bid-scaling may not be an optimal bidding strategy, and bidders have incentives to deviate to adopt strategies with non-uniform bid-scaling. Moreover, FPA can achieve optimal welfare if restricted to uniform bid-scaling, while its price of anarchy becomes 2 when non-uniform bid-scaling strategies are allowed. All these price of anarchy results have been focused on welfare approximation in the worst-case scenarios. To complement theoretical understandings, we empirically study how different auction formats (FPA, GSP, VCG) with different levels of non-uniform bid-scaling perform in an autobidding world with a synthetic dataset for auctions. Our empirical findings include: * For both uniform bid-scaling and non-uniform bid-scaling, FPA is better than GSP and GSP is better than VCG in terms of both welfare and profit; * A higher level of non-uniform bid-scaling leads to lower welfare performance in both FPA and GSP, while different levels of non-uniform bid-scaling have no effect in VCG. Our methodology of synthetic data generation may be of independent interest. View details
    Efficiency of the Generalized Second-Price Auction for Value Maximizers
    Jieming Mao
    Hanrui Zhang
    Song Zuo
    Proceedings of the ACM on Web Conference 2024, 46–56
    Preview abstract We study the price of anarchy of the generalized second-price auction where bidders are value maximizers (i.e., autobidders). We show that in general the price of anarchy can be as bad as 0. For comparison, the price of anarchy of running VCG is 1/2 in the autobidding world. We further show a fined-grained price of anarchy with respect to the discount factors (i.e., the ratios of click probabilities between lower slots and the highest slot in each auction) in the generalized second-price auction, which highlights the qualitative relation between the smoothness of the discount factors and the efficiency of the generalized second-price auction. View details
    PriorBoost: An Adaptive Algorithm for Learning from Aggregate Responses
    Adel Javanmard
    Proceedings of the 41st International Conference on Machine Learning(2024) (to appear)
    Preview abstract This work studies algorithms for learning from aggregate responses. We focus on the construction of aggregation sets (called \emph{bags} in the literature) for event-level loss functions. We prove for linear regression and generalized linear models (GLMs) that the optimal bagging problem reduces to one-dimensional size-constrained $k$-means clustering. Further, we theoretically quantify the advantage of using curated bags over random bags. We propose the \texttt{PriorBoost} algorithm, which iteratively forms increasingly homogenous bags with respect to (unseen) individual responses to improve model quality. We also explore label differential privacy for aggregate learning, and provide extensive experiments that demonstrate that \PriorBoost regularly achieves optimal quality, in contrast to non-adaptive algorithms for aggregate learning. View details
    Preview abstract We introduceTeraHAC, a (1+epsilon)-approximate hierarchical agglomerative clustering (HAC) algorithm whichs cales to trillion-edge graphs. Our algorithm is based on a new approach to computing (1+epsilon)-approximate HAC, which is a novel combination of the nearest-neighbor chain algorithm and the notion of (1+epsilon)-approximate HAC. Our approach allows us to partition the graph among multiple machines and make significant progress in computing the clustering within each partition before any communication with other partitions is needed.We evaluate TeraHAC on a number of real-world and synthetic graphs of up to 8 trillion edges. We show that TeraHAC requires over 100x fewer rounds compared to previously known approaches for computing HAC. It is up to 8.3x faster than SCC, the state-of-the-art distributed algorithm for hierarchical clustering, while achieving 1.16x higher quality. In fact, TeraHAC essentially retains the quality of the celebrated HAC algorithm while significantly improving the running time. View details
    Optimal Fully Dynamic k-Center Clustering for Adaptive and Oblivious Adversaries
    Hossein Esfandiari
    Hendrik Fichtenberger
    Monika Henzinger
    Andreas Wiese
    Proceedings of the 2023 Annual ACM-SIAM Symposium on Discrete Algorithms (SODA)
    Preview
    Preview abstract Representative Selection (RS) is the problem of finding a small subset of exemplars from a dataset that is representative of the dataset. In this paper, we study RS for attributed graphs, and focus on finding representative nodes that optimize the accuracy of a model trained on the selected representatives. Theoretically, we establish a new hardness result for RS (in the absence of a graph structure) by proving that a particular, highly practical variant of it (RS for Learning) is hard to approximate in polynomial time within any reasonable factor, which implies a significant potential gap between the optimum solution of widely-used surrogate functions and the actual accuracy of the model. We then study the setting where a (homophilous) graph structure is available, or can be constructed, between the data points. We show that with an appropriate modeling approach, the presence of such a structure can turn a hard RS (for learning) problem into one that can be effectively solved. To this end, we develop RS-GNN, a representation learning-based RS model based on Graph Neural Networks. Empirically, we demonstrate the effectiveness of RS-GNN on problems with predefined graph structures as well as problems with graphs induced from node feature similarities, by showing that RS-GNN achieves significant improvements over established baselines on a suite of eight benchmarks. View details
    Learning Rate Schedules in the Presence of Distribution Shift
    Adel Javanmard
    Pratik Worah
    Proceedings of the 40th International Conference on Machine Learning(2023), pp. 9523-9546
    Preview abstract We design learning rate schedules that minimize regret for SGD-based online learning in the presence of a changing data distribution. We fully characterize the optimal learning rate schedule for online linear regression via a novel analysis with stochastic differential equations. For general convex loss functions, we propose new learning rate schedules that are robust to distribution shift, and we give upper and lower bounds for the regret that only differ by constants. For non-convex loss functions, we define a notion of regret based on the gradient norm of the estimated models and propose a learning schedule that minimizes an upper bound on the total expected regret. Intuitively, one expects changing loss landscapes to require more exploration, and we confirm that optimal learning rate schedules typically increase in the presence of distribution shift. Finally, we provide experiments for high-dimensional regression models and neural networks to illustrate these learning rate schedules and their cumulative regret. View details