Data Center Power Oversubscription with a Medium Voltage Power Plane and Priority-Aware Capping

David Landhuis
Shaohong Li
Darren De Ronde
Thomas Blooming
Anand Ramesh
James Kennedy
Christopher Malone
Jimmy Clidaras
Proceedings of the Twenty-Fifth International Conference on Architectural Support for Programming Languages and Operating Systems, Association for Computing Machinery, New York, NY, USA (2020), 497–511

Abstract

As major web and cloud service providers continue to accelerate the demand for new data center capacity worldwide, the importance of power oversubscription as a lever to reduce provisioning costs has never been greater. Building on insights from Google-scale deployments, we design and deploy a new architecture across hardware and software to improve power oversubscription significantly. Our design includes (1) a new medium voltage power plane to enable larger power sharing domains (across tens of MW of equipment) and (2) a scalable, fast, and robust power capping service coordinating multiple priorities of workload on every node. Over several years of production deployment, our co-design has enabled power oversubscription of 25% or higher, saving hundreds of millions of dollars of data center costs, while preserving the desired availability and performance of all workloads.