Cache-aware load balancing of data center applications

Aaron Archer; Kevin Aydin; MohammadHossein Bateni; Vahab Mirrokni; Aaron Schild; Ray Yang; Richard Zhuang

Cache-aware load balancing of data center applications

Aaron Archer

Kevin Aydin

MohammadHossein Bateni

Vahab Mirrokni

Aaron Schild

Ray Yang

Richard Zhuang

Proceedings of the VLDB Endowment, 12 (2019), pp. 709-723

Download Google Scholar

Abstract

Our deployment of cache-aware load balancing in the Google web search backend reduced cache misses by ~0.5x, contributing to a double-digit percentage increase in the throughput of our serving clusters by relieving a bottleneck. This innovation has benefited all production workloads since 2015, serving billions of queries daily.

A load balancer forwards each query to one of several identical serving replicas. The replica pulls each term's postings list into RAM from flash, either locally or over the network. Flash bandwidth is a critical bottleneck, motivating an application-directed RAM cache on each replica. Sending the same term reliably to the same replica would increase the chance it hits cache, and avoid polluting the other replicas' caches. However, most queries contain multiple terms and we have to send the whole query to one replica, so it is not possible to achieve a perfect partitioning of terms to replicas.

We solve this via a voting scheme, whereby the load balancer conducts a weighted vote by the terms in each query, and sends the query to the winning replica. We develop a multi-stage scalable algorithm to learn these weights. We first construct a large-scale term-query graph from logs and apply a distributed balanced graph partitioning algorithm to cluster each term to a preferred replica. This yields a good but simplistic initial voting table, which we then iteratively refine via cache simulation to capture feedback effects.

Research Areas

Algorithms and theory

Explore our many areas of focus

Building a collaborative ecosystem

Shaping the future together

Translating discovery into real-world impact

Cache-aware load balancing of data center applications

Abstract

Research Areas

Meet the teams driving innovation

Google AI

Google Cloud

Google DeepMind

Google Labs