Aaron Archer

Aaron Archer

Aaron is a Research Scientist at Google NYC, where he leads the Large-Scale Optimization research team, which is part of the broader NYC Algorithms and Optimization team. He came to Google in 2013 after 9+ years in the Algorithms and Optimization department at the AT&T Shannon Research Laboratory. Aaron earned degrees in Operations Research (M.S. 2002, Ph.D. 2004) from Cornell University, where he was advised by Eva Tardos and supported by a Hertz Fellowship. Prior to that, he studied Mathematics (B.S. 1998) at Harvey Mudd College, where he graduated first in his class. Aaron has pursued theoretical, experimental and applied research in diverse areas including mathematical programming (e.g., linear, integer, and convex programming), approximation algorithms, graph algorithms, algorithmic game theory, online algorithms, network design, load balancing, and unsupervised machine learning (e.g., clustering). He particularly enjoys finding effective ways to model complex real-world optimization problems. His primary mission at Google has been to apply these and other techniques to improve the efficiency of Google's computational infrastructure, such as the backend for websearch. Aaron has also made small forays into graphics, machine vision, combinatorics, graph theory, computational ecology, and mathematical rap.
Authored Publications
Google Publications
Other Publications
Sort By
  • Title
  • Title, descending
  • Year
  • Year, descending
    Practical Performance Guarantees for Pipelined DNN Inference
    Kuikui Liu
    Prakash Prabhu
    Proceedings of the 41st International Conference on Machine Learning(2024) (to appear)
    Preview abstract This work optimizes pipeline parallelism of machine learning model inference by partitioning computation graphs into $k$ stages and minimizing the running time of the bottleneck stage. We design practical algorithms for this NP-complete problem and prove they are nearly optimal in practice by comparing against lower bounds obtained from solving novel mixed-integer programming (MIP) formulations. We apply these algorithms and lower-bound techniques to production models to achieve substantial improvements in the approximation guarantees, compared to simple combinatorial lower bounds. For example, our new MIP formulations improve the lower bounds enough to drop the geometric mean approximation ratio from $2.175$ to $1.082$ across production data with $k=16$ pipeline stages. This work shows that while bottleneck partitioning is theoretically hard, in practice we have a handle on the algorithmic side of the problem and much of the remaining challenge is in developing more accurate cost models to give to the partitioning algorithms. View details
    Load is not what you should balance: Introducing Prequal
    Bartek Wydrowski
    Bobby Kleinberg
    Steve Rumble
    (2024)
    Preview abstract We present Prequal (\emph{Probing to Reduce Queuing and Latency}), a load balancer for distributed multi-tenant systems. Prequal aims to minimize real-time request latency in the presence of heterogeneous server capacities and non-uniform, time-varying antagonist load. It actively probes server load to leverage the \emph{power of $d$ choices} paradigm, extending it with asynchronous and reusable probes. Cutting against received wisdom, Prequal does not balance CPU load, but instead selects servers according to estimated latency and active requests-in-flight (RIF). We explore its major design features on a testbed system and evaluate it on YouTube, where it has been deployed for more than two years. Prequal has dramatically decreased tail latency, error rates, and resource use, enabling YouTube and other production systems at Google to run at much higher utilization. View details
    Cache-aware load balancing of data center applications
    Aaron Schild
    Ray Yang
    Richard Zhuang
    Proceedings of the VLDB Endowment, 12(2019), pp. 709-723
    Preview abstract Our deployment of cache-aware load balancing in the Google web search backend reduced cache misses by ~0.5x, contributing to a double-digit percentage increase in the throughput of our serving clusters by relieving a bottleneck. This innovation has benefited all production workloads since 2015, serving billions of queries daily. A load balancer forwards each query to one of several identical serving replicas. The replica pulls each term's postings list into RAM from flash, either locally or over the network. Flash bandwidth is a critical bottleneck, motivating an application-directed RAM cache on each replica. Sending the same term reliably to the same replica would increase the chance it hits cache, and avoid polluting the other replicas' caches. However, most queries contain multiple terms and we have to send the whole query to one replica, so it is not possible to achieve a perfect partitioning of terms to replicas. We solve this via a voting scheme, whereby the load balancer conducts a weighted vote by the terms in each query, and sends the query to the winning replica. We develop a multi-stage scalable algorithm to learn these weights. We first construct a large-scale term-query graph from logs and apply a distributed balanced graph partitioning algorithm to cluster each term to a preferred replica. This yields a good but simplistic initial voting table, which we then iteratively refine via cache simulation to capture feedback effects. View details
    Wireless coverage prediction via parametric shortest paths
    David Applegate
    David S. Johnson
    Evdokia Nikolova
    Mikkel Thorup
    Proceedings of the Nineteenth ACM International Symposium on Mobile Ad Hoc Networking and Computing (MobiHoc'18), ACM(2018), pp. 221-230
    Preview abstract When deciding where to place access points in a wireless network, it is useful to model the signal propagation loss between a proposed antenna location and the areas it may cover. The indoor dominant path (IDP) model, introduced by Wölfle et al., is shown in the literature to have good validation and generalization error, is faster to compute than competing methods, and is used in commercial software such as WinProp, iBwave Design, and CellTrace. Previously, the algorithms known for computing it involved a worst-case exponential-time tree search, with pruning heuristics to speed it up. We prove that the IDP model can be reduced to a parametric shortest path computation on a graph derived from the walls in the floorplan. It therefore admits a quasipolynomial-time (i.e., nO(logn)) algorithm. We also give a practical approximation algorithm based on running a small constant number of shortest path computations. Its provable worst-case additive error (in dB) can be made arbitrarily small via appropriate choices of parameters, and is well below 1dB for reasonable choices. We evaluate our approximation algorithm empirically against the exact IDP model, and show that it consistently beats its theoretical worst-case bounds, solving the model exactly (i.e., no error) in the vast majority of cases. View details
    Preview abstract We consider the reachability indexing problem for private-public directed graphs. In these graphs nodes come in three flavors: public—nodes visible to all users, private—nodes visible to a specific set of users, and protected—nodes visible to any user who can see at least one of the node’s parents. We are interested in computing the set of nodes visible to a specific user online. There are two obvious algorithms: precompute the result for every user, or run a reachability algorithm at query time. This paper explores the trade-off between these two strategies. Our approach is to identify a set of additional visible seed nodes for each user. The online reachability algorithm explores the graph starting at these nodes. We first formulate the problem as asymmetric k-center with outliers, and then give an efficient and practical algorithm. We prove new theoretical guarantees for this problem and show empirically that it performs very well in practice. View details
    Optimal Content Placement for a Large-Scale VoD System
    David Applegate
    Vijay Gopalakrishnan
    K.K. Ramakrishnan
    IEEE/ACM Transactions on Networking, 24(2016), pp. 2114-2127
    Preview abstract IPTV service providers offering Video-on-Demand currently use servers at each metropolitan office to store all the videos in their library. With the rapid increase in library sizes, it will soon become infeasible to replicate the entire library at each office. We present an approach for intelligent content placement that scales to large library sizes (e.g., 100Ks of videos). We formulate the problem as a mixed integer program (MIP) that takes into account constraints such as disk space, link bandwidth, and content popularity. To overcome the challenges of scale, we employ a Lagrangian relaxation-based decomposition technique combined with integer rounding. Our technique finds a near-optimal solution (e.g., within 1-2%) with orders of magnitude speedup relative to solving even the LP relaxation via standard software. We also present simple strategies to address practical issues such as popularity estimation, content updates, short-term popularity fluctuation, and frequency of placement updates. Using traces from an operational system, we show that our approach significantly outperforms simpler placement strategies. For instance, our MIP-based solution can serve all requests using only half the link bandwidth used by LRU or LFU cache replacement policies. We also investigate the trade-off between disk space and network bandwidth. View details
    Content placement via the exponential potential function method
    David Applegate
    Vijay Gopalakrishnan
    K.K. Ramakrishnan
    Integer Programming and Combinatorial Optimization - 16th International Conference, IPCO 2013, Springer, pp. 49-61
    Leveraging video viewing patterns for optimal content placement
    Kyung-Wook Hwang
    David Applegate
    Vijay Gopalakrishnan
    Vishal Misra
    K. K. Ramakrishnan
    Deborah F. Swayne
    NETWORKING 2012 - 11th International IFIP TC 6 Networking Conference Proceedings, Part II, pp. 44-58
    Combining predictors for recommending music: the False Positives' approach to KDD Cup track 2
    Suhrid Balakrishnan
    Rensheng Wang
    Carlos Eduardo Scheidegger
    Angus MacLellan
    Yifan Hu
    Shankar Krishnan
    David Applegate
    Guangqin Ma
    S. Tom Au
    JMLR Proceedings, 18(2012), pp. 199-213
    Improved Approximation Algorithms for Prize-Collecting Steiner Tree and TSP
    MohammadTaghi Hajiaghayi
    Howard Karloff
    SIAM Journal on Computing, 40(2)(2011), pp. 309-332
    Voting power and target-based site prioritization
    Steven J. Phillips
    Robert L. Pressey
    Desmond Torkornoo
    David Applegate
    David Johnson
    Matthew E. Watts
    Biological Conservation, 143(2010), pp. 1989-1997
    Improved approximation algorithms for the minimum latency problem via prize-collecting strolls
    Anna Blasiak
    Proceedings of the Twenty-First Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2010, SIAM, pp. 429-447
    Importance sampling via load-balanced facility location
    Shankar Krishnan
    Integer Programming and Combinatorial Optimization, 13th International Conference, IPCO 2008, Springer, pp. 316-330
    Characterizing truthful mechanisms with convex type spaces
    Robert Kleinberg
    SIGecom Exchanges, 7(2008)
    A faster, better approximation algorithm for the minimum latency problem
    Asaf Levin
    David P. Williamson
    SIAM Journal on Computing, 37(2008), pp. 1472-1498
    Optimizing dispersal corridors for the Cape Proteaceae using network flow
    Steven J. Phillips
    Paul Williams
    Guy Midgley
    Ecological Applications, 18(2008), pp. 1200-1211
    The 15 puzzle: How it drove the world crazy, by Jerry Slocum and Dic Sonneveld [book review]
    Mathematical Intelligencer, 29(2007), pp. 83-85
    Frugal path mechanisms
    Eva Tardos
    ACM Transactions on Algorithms, 3(2007)
    Approximation and collusion in multicast cost sharing
    Joan Feigenbaum
    Arvind Krishnamurthy
    Rahul Sami
    Scott Shenker
    Games and Economic Behavior, 47(2004), pp. 36-71
    Approximate classification via earthmover metrics
    Jittat Fakcharoenphol
    Robert Krauthgamer
    Kunal Talwar
    Eva Tardos
    Proceedings of the Fifteenth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2004, SIAM, pp. 1079-1087
    An approximate truthful mechanism for combinatorial auctions with single parameter agents
    Christos H. Papadimitriou
    Kunal Talwar
    Eva Tardos
    Internet Mathematics, 1(2003), pp. 129-150
    Lagrangian relaxation for the k-median problem: new insights and continuity properties
    Ranjithkumar Rajagopalan
    David B. Shmoys
    Algorithms - ESA 2003, 11th Annual European Symposium, Springer, pp. 31-42
    Truthful mechanisms for one-parameter agents
    Eva Tardos
    42nd Annual Symposium on Foundations of Computer Science, FOCS 2001, pp. 482-491
    Two O(log* k)-approximation algorithms for the asymmetric k-center problem
    Integer Programming and Combinatorial Optimization, 8th International IPCO Conference(2001), pp. 1-14
    March Mathness: an analysis of a nonstandard basketball pool
    Rick Cleary
    Robin Lock
    John Trono
    Math Horizons(2001), pp. 17-21
    On the upper chromatic numbers of the reals
    Discrete Mathematics, 214(2000), pp. 65-75
    A modern treatment of the 15 puzzle
    American Mathematical Monthly, 106(1999), pp. 793-799
    A case for stricter grading
    Andrew D. Hutchings
    Brian Johnson
    UMAP Journal, 19(1998), pp. 299-313