Large-scale optimization

Our mission is to develop large-scale optimization techniques and use them to improve the efficiency and robustness of infrastructure at Google.

About the team

We apply techniques from large-scale combinatorial optimization, online algorithms, and control theory to make Google’s computing infrastructure do more with less. We combine online and offline optimizations to achieve goals such as reducing search query latency, increasing model inference throughput and prediction quality, minimizing resource contention, maximizing the efficacy of caches, and eliminating unnecessary work in distributed systems. Our research is used in critical infrastructure that supports Search, Ads, Gemini, YouTube, and Cloud products.

Team focus summaries

Featured publications

Load is not what you should balance: Introducing Prequal
Bartek Wydrowski
Bobby Kleinberg
Steve Rumble
(2024)
Sequential Attention for Feature Selection
Taisuke Yasuda
Lin Chen
Proceedings of the 11th International Conference on Learning Representations (2023)
Practical Large-Scale Linear Programming using Primal-Dual Hybrid Gradient
Mateo Díaz
Oliver Hinder
Haihao Lu
Miles Lubin
Brendan O'Donoghue
Warren Schudy
NeurIPS 2021
Edge-Weighted Online Bipartite Matching
Runzhou Tao
Zhiyi Huang
Journal of the ACM, 69 (2022), 45:1-45:35
Cache-aware load balancing of data center applications
Aaron Schild
Ray Yang
Richard Zhuang
Proceedings of the VLDB Endowment, 12 (2019), pp. 709-723
Consistent Hashing with Bounded Loads
Mikkel Thorup
Proceedings of the Twenty-Ninth Annual ACM-SIAM Symposium on Discrete Algorithms (2018), pp. 587-604
Almost Optimal Streaming Algorithms for Coverage Problems
Hossein Esfandiari
29th ACM Symposium on Parallelism in Algorithms and Architectures (2017)
HyperAttention: Large-scale Attention in Linear Time
Amin Karbasi
Amir Zandieh
Insu Han
David Woodruff
HyperAttention: Long-context Attention in Near-Linear Time (2024) (to appear)

Highlighted work