Our mission is to develop large-scale optimization techniques and use them to improve the efficiency and robustness of infrastructure at Google.
We apply techniques from large-scale combinatorial optimization, online algorithms, and control theory to make Google’s computing infrastructure do more with less. We combine online and offline optimizations to achieve goals such as reducing search query latency, increasing model inference throughput and prediction quality, minimizing resource contention, maximizing the efficacy of caches, and eliminating unnecessary work in distributed systems. Our research is used in critical infrastructure that supports Search, Ads, Gemini, YouTube, and Cloud products.
We developed and deployed load balancing tools based on the power-of-d-choices paradigm for Google’s distributed serving systems. Our tools drive most of the YouTube serving stack and have also found success in Search Ads, logs processing, and serving ML models, where they have greatly improved tail latencies and error rates while saving resources by allowing our systems to run at higher utilizations.
We developed a linear programming (LP) solver that can solve instances with 100 billion non-zeros, which is about 1000x larger than the previous state-of-the-art. To achieve this scale, we rely on first-order methods where the bottleneck operations are matrix-vector multiplications, so we can avoid factoring matrices. This allows us to circumvent the memory scaling bottleneck faced by traditional LP solvers. The in-memory, multi-core version of our solver is open-sourced as PDLP, within Google's OR-Tools library.
Optimizing ML model structures (e.g., feature selection, embedding table optimization, matrix sparsification, and neural architecture search) is a core problem with a significant impact on prediction quality, resource efficiency, and ultimately revenue. These are often NP-hard combinatorial optimization problems, so we developed efficient differentiable search techniques (e.g., Sequential Attention) to tackle these problems at scale, while offering provable guarantees.
Several of our projects have partnered with product teams to help improve the efficiency of Google's search infrastructure. In the backend for web search, we introduced a distributed feedback control loop to govern the way queries are fanned out to worker machines, in order to balance load better by taking advantage of real-time load information on the workers. We also improved the efficacy of caching by increasing the homogeneity of the stream of queries seen by any single worker machine. To accomplish this, we clustered query terms, used these clusters to define voting weights, and assigned queries to workers on the basis of votes cast by the terms in the query. In a pub-sub system that powers some of our display ads, we saved computation by clustering some components of subscriptions in a way that lets us take advantage of bitwise-parallel CPU operations.
Composable core-sets provide an effective method for solving optimization problems on massive datasets. The main idea is to partition data among some number of machines, and use each machine to compute some small summary/sketch of the data. After gathering all summaries on one machine, we solve the original optimization problem on the combined sketch.
Maximum coverage and minimum set-cover problems are among the fundamental problems in combinatorial optimization and have a variety of applications in machine learning and data mining. We study these problems in the streaming and MapReduce models of computation, and develop practical algorithms with tight theoretical bounds for runtime, space and approximation guarantee. Our main idea is a novel sketching technique that compresses the input into a small space, independent of the size of the ground set, without hurting the quality by much.
We design memoryless balanced allocation algorithms to assign a dynamic set of tasks to a dynamic set of servers such that the load (the number of assigned tasks) on each server is bounded, and the allocation does not change by much for every update operation.
David Applegate
Aaron Archer
Kevin Aydin
Mohammadhossein Bateni
Lin Chen
David Eisenstat
Matthew Fahrbach
Gang (Thomas) Fu
Joey Huchette
Siddhartha Visveswara Jayanti
Rajesh Jayaram
Vahab S. Mirrokni
Jon Orwant
Jon Schneider
Morteza Zadimoghaddam
Pratik Worah