Jump to Content
Kevin Aydin

Kevin Aydin

Authored Publications
Google Publications
Other Publications
Sort By
  • Title
  • Title, descending
  • Year
  • Year, descending
    Cache-aware load balancing of data center applications
    Aaron Schild
    Ray Yang
    Richard Zhuang
    Proceedings of the VLDB Endowment, vol. 12 (2019), pp. 709-723
    Preview abstract Our deployment of cache-aware load balancing in the Google web search backend reduced cache misses by ~0.5x, contributing to a double-digit percentage increase in the throughput of our serving clusters by relieving a bottleneck. This innovation has benefited all production workloads since 2015, serving billions of queries daily. A load balancer forwards each query to one of several identical serving replicas. The replica pulls each term's postings list into RAM from flash, either locally or over the network. Flash bandwidth is a critical bottleneck, motivating an application-directed RAM cache on each replica. Sending the same term reliably to the same replica would increase the chance it hits cache, and avoid polluting the other replicas' caches. However, most queries contain multiple terms and we have to send the whole query to one replica, so it is not possible to achieve a perfect partitioning of terms to replicas. We solve this via a voting scheme, whereby the load balancer conducts a weighted vote by the terms in each query, and sends the query to the winning replica. We develop a multi-stage scalable algorithm to learn these weights. We first construct a large-scale term-query graph from logs and apply a distributed balanced graph partitioning algorithm to cluster each term to a preferred replica. This yields a good but simplistic initial voting table, which we then iteratively refine via cache simulation to capture feedback effects. View details
    Randomized Experimental Design via Geographic Clustering
    David Rolnick
    Amir Najmi
    Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining (2019)
    Preview abstract Web-based services often run randomized experiments to improve their products. A popular way to run these experiments is to use geographical regions as units of experimentation, since this does not require tracking of individual users or browser cookies. Since users may issue queries from multiple geographical locations, georegions cannot be considered independent and interference may be present in the experiment. In this paper, we study this problem, and first present GeoCUTS, a novel algorithm that forms geographical clusters to minimize interference while preserving balance in cluster size. We use a random sample of anonymized traffic from Google Search to form a graph representing user movements, then construct a geographically coherent clustering of the graph. Our main technical contribution is a statistical framework to measure the effectiveness of clusterings. Furthermore, we perform empirical evaluations showing that the performance of GeoCUTS is comparable to hand-crafted geo-regions with respect to both novel and existing metrics. View details
    Variance Reduction in Bipartite Experiments through Correlation Clustering
    Warren Schudy
    Thirty-third Conference on Neural Information Processing Systems (2019) (to appear)
    Preview abstract Causal inference in randomized experiments typically assumes that the units of randomization and the units of analysis are one and the same. In some applications, however, these two roles are played by distinct entities linked by a bipartite graph. The key challenge in such bipartite settings is how to avoid interference bias, which would typically arise if we simply randomized the treatment at the level of analysis units. One effective way of minimizing interference bias in standard experiments is through cluster randomization, but this design has not been studied in the bipartite setting where conventional clustering schemes can lead to poorly powered experiments. This paper introduces a novel clustering objective and a corresponding algorithm that partitions a bipartite graph so as to maximize the statistical power of a bipartite experiment on that graph. Whereas previous work relied on balanced partitioning, our formulation suggests the use of a correlation clustering objective. We use a publicly-available graph of Amazon user-item reviews to validate our solution and illustrate how it substantially increases the statistical power in bipartite experiments. View details
    Preview abstract Balanced partitioning is often a crucial first step in solving large-scale graph optimization problems, e.g., in some cases, a big graph can be chopped into pieces that fit on one machine to be processed independently before stitching the results together, leading to certain suboptimality from the interaction among different pieces. In other cases, links between different parts may show up in the running time and/or network communications cost, hence the desire to have small cut size. We study a distributed balanced-partitioning problem where the goal is to partition the vertices of a given graph into k pieces so as to minimize the total cut size. Our algorithm is composed of a few steps that are easily implementable in distributed computation frameworks such as MapReduce. The algorithm first embeds nodes of the graph onto a line, and then processes nodes in a distributed manner guided by the linear embedding order. We examine various ways to find the first embedding, e.g., via a hierarchical clustering or Hilbert curves. Then we apply four different techniques including local swaps,minimum cuts on the boundaries of partitions, as well as contraction and dynamic programming. As our empirical study, we compare the above techniques with each other, and also to previous work in distributed graph algorithms, e.g., a label-propagation method [UB13], FENNEL [TGRV14] and Spinner [MLS14]. We report our results both on a private map graph and several public social networks,and show that our results beat previous distributed algorithms: For instance, compared to the label-propagation algorithm [UB13], we report an improvement of 15-25% in the cut value. We also observe that our algorithms admit scalable distributed implementation for any number of partitions. Finally, we explain three applications of this work at Google. •Balanced partitioning is used to route multi-term queries to different replicas in Google Search backend in a way that reduces the cache miss rates by≈0.5%, which leads to a double-digit gain in throughput of production clusters [AAB+19]. •Applied to the Google Maps Driving Directions, balanced partitioning minimizes the number of cross-shard queries with the goal of saving in CPU usage. This system achieves load balancing by dividing the world graph into several “shards.” Live experiments demonstrate an≈40% drop in the number of cross-shard queries when compared to a standard geography-based method. •In a job scheduling problem for our data centers, we use balanced partitioning to evenly distribute the work while minimizing the amount of communication across geographically distant servers. In fact, the hierarchical nature of our solution goes well with the layering of data center servers, where certain machines are closer to each other and have faster links to one another. View details
    Distributed Balanced Partitioning via Linear Embedding
    Ninth ACM International Conference on Web Search and Data Mining (WSDM), ACM (2016), pp. 387-396
    Preview abstract Balanced partitioning is often a crucial first step in solving large-scale graph optimization problems: in some cases, a big graph is chopped into pieces that fit on one machine to be processed independently before stitching the results together, leading to certain suboptimality from the interaction among different pieces. In other cases, links between different parts may show up in the running time and/or network communications cost, hence the desire to have small cut size. We study a distributed balanced partitioning problem where the goal is to partition the vertices of a given graph into k pieces, minimizing the total cut size. Our algorithm is composed of a few steps that are easily implementable in distributed computation frameworks, e.g., MapReduce. The algorithm first embeds nodes of the graph onto a line, and then processes nodes in a distributed manner guided by the linear embedding order. We examine various ways to find the first embedding, e.g., via a hierarchical clustering or Hilbert curves. Then we apply four different techniques such as local swaps, minimum cuts on partition boundaries, as well as contraction and dynamic programming. Our empirical study compares the above techniques with each other, and to previous work in distributed algorithms, e.g., a label propagation method [34], FENNEL [32] and Spinner [23]. We report our results both on a private map graph and several public social networks, and show that our results beat previous distributed algorithms: we notice, e.g., 15-25% reduction in cut size over [34]. We also observe that our algorithms allow for scalable distributed implementation for any number of partitions. Finally, we apply our techniques for the Google Maps Driving Directions to minimize the number of multi-shard queries with the goal of saving in CPU usage. During live experiments, we observe an ≈ 40% drop in the number of multi-shard queries when comparing our method with a standard geography-based method. View details
    No Results Found