Jump to Content

CliqueMap: Productionizing an RMA-Based Distributed Caching System

Aditya Akella
Amanda Strominger
Arjun Singhvi
Maggie Anderson
Rob Cauble
Thomas F. Wenisch
SIGCOMM 2021 (2021) (to appear)

Abstract

Distributed caching is a key component in the design of performant, scalable Internet services, but accessing such caches via RPC incurs high cost. Remote Memory Access (RMA) offers a promising, less costly alternative, but achieving a rich production feature set with RMA-based systems is a significant challenge, as the rich abstraction of RPC lends itself to solutions for interoperability and upgradeability requirements of real systems. This work describes CliqueMap, a fully productionized RMA/RPC hybrid serving and caching system, and the production experience derived from three years of operation in Google’s datacenters. Building on internal technologies, CliqueMap serves multiple internal product areas and underlies several end-user-visible services.