Beyond malloc efficiency to fleet efficiency: a hugepage-aware memory allocator

Andrew Hamilton Hunter

Chris Kennelly

Darryl Gove

Parthasarathy Ranganathan

Paul Jack Turner

Tipp James Moseley

15th USENIX Symposium on Operating Systems Design and Implementation (OSDI 21) (2021) (to appear)

Download Google Scholar

Abstract

Memory allocation represents significant compute cost at the warehouse scale and its optimization can yield considerable cost savings. One classical approach is to increase the efficiency of an allocator to minimize the cycles spent in the allocator code. However, memory allocation decisions also impact overall application performance via data placement, offering opportunities to improve fleetwide productivity by completing more units of application work using fewer hardware resources. Here, we focus on hugepage coverage. We present TEMERAIRE, a hugepage-aware enhancement of TCMALLOC to reduce CPU overheads in the application’s code. We discuss the design and implementation of TEMERAIRE including strategies for hugepage-aware memory layouts to maximize hugepage coverage and to minimize fragmentation overheads. We present application studies for 8 applications, improving requests-per-second (RPS) by 7.7% and reducing RAM usage 2.4%. We present the results of a 1% experiment at fleet scale as well as the longitudinal rollout in Google’s warehouse scale computers. This yielded 6% fewer TLB miss stalls, and 26% reduction in memory wasted due to fragmentation. We conclude with a discussion of additional techniques for improving the allocator development process and potential optimization strategies for future memory allocators.

Research Areas

Software systems

Explore our many areas of focus

Building a collaborative ecosystem

Shaping the future together

Translating discovery into real-world impact

Beyond malloc efficiency to fleet efficiency: a hugepage-aware memory allocator

Abstract

Research Areas

Meet the teams driving innovation