Learning-based Memory Allocation for C++ Server Workloads

Martin Maas; David G. Andersen; Michael Isard; Mohammad Mahdi Javanmard; Kathryn S. McKinley; Colin Raffel

Learning-based Memory Allocation for C++ Server Workloads

Martin Maas

David G. Andersen

Michael Isard

Mohammad Mahdi Javanmard

Kathryn S. McKinley

Colin Raffel

25th ACM International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS) (2020) (to appear)

Download Google Scholar

Abstract

Modern C++ servers have memory footprints that vary widely over time, causing persistent heap fragmentation of up to 2x from long-lived objects allocated during peak memory usage. This fragmentation is exacerbated by the use of huge (2MB) pages, a requirement for high performance on large heap sizes. Reducing fragmentation automatically is challenging because C++ memory managers cannot move objects.

This paper presents a new approach to huge page fragmentation. It combines modern machine learning techniques with a novel memory manager (LLAMA) that manages the heap based on object lifetimes and huge pages (divided into blocks and lines). A neural network-based language model predicts lifetime classes using symbolized calling contexts. The model learns context-sensitive per-allocation site lifetimes from previous runs, generalizes over different binary versions, and extrapolates from samples to unobserved calling contexts. Instead of size classes, LLAMA's heap is organized by lifetime classes that are dynamically adjusted based on observed behavior at a block granularity.

LLAMA reduces memory fragmentation by up to 78% while only using huge pages on several production servers. We address ML-specific questions such as tolerating mispredictions and amortizing expensive predictions across application execution. Although our results focus on memory allocation, the questions we identify apply to other system-level problems with strict latency and resource requirements where machine learning could be applied.

Explore our many areas of focus

Building a collaborative ecosystem

Shaping the future together

Translating discovery into real-world impact

Learning-based Memory Allocation for C++ Server Workloads

Abstract

Research Areas

Meet the teams driving innovation

Google AI

Google Cloud

Google DeepMind

Google Labs