Jump to Content
Dan Gibson

Dan Gibson

Authored Publications
Google Publications
Other Publications
Sort By
  • Title
  • Title, desc
  • Year
  • Year, desc
    Aquila: A unified, low-latency fabric for datacenter networks
    Hema Hariharan
    Eric Lance
    Moray Mclaren
    Stephen Wang
    Zhehua Wu
    Sunghwan Yoo
    Raghuraman Balasubramanian
    Prashant Chandra
    Michael Cutforth
    Peter James Cuy
    David Decotigny
    Rakesh Gautam
    Rick Roy
    Zuowei Shen
    Ming Tan
    Ye Tang
    Monica C Wong-Chan
    Joe Zbiciak
    Aquila: A unified, low-latency fabric for datacenter networks (2022)
    Preview abstract Datacenter workloads have evolved from the data intensive, loosely-coupled workloads of the past decade to more tightly coupled ones, wherein ultra-low latency communication is essential for resource disaggregation over the network and to enable emerging programming models. We introduce Aquila, an experimental datacenter network fabric built with ultra-low latency support as a first-class design goal, while also supporting traditional datacenter traffic. Aquila uses a new Layer 2 cell-based protocol, GNet, an integrated switch, and a custom ASIC with low-latency Remote Memory Access (RMA) capabilities co-designed with GNet. We demonstrate that Aquila is able to achieve under 40 μs tail fabric Round Trip Time (RTT) for IP traffic and sub-10 μs RMA execution time across hundreds of host machines, even in the presence of background throughput-oriented IP traffic. This translates to more than 5x reduction in tail latency for a production quality key-value store running on a prototype Aquila network. View details
    CliqueMap: Productionizing an RMA-Based Distributed Caching System
    Aditya Akella
    Amanda Strominger
    Arjun Singhvi
    Maggie Anderson
    Rob Cauble
    Thomas F. Wenisch
    SIGCOMM 2021 (2021) (to appear)
    Preview abstract Distributed caching is a key component in the design of performant, scalable Internet services, but accessing such caches via RPC incurs high cost. Remote Memory Access (RMA) offers a promising, less costly alternative, but achieving a rich production feature set with RMA-based systems is a significant challenge, as the rich abstraction of RPC lends itself to solutions for interoperability and upgradeability requirements of real systems. This work describes CliqueMap, a fully productionized RMA/RPC hybrid serving and caching system, and the production experience derived from three years of operation in Google’s datacenters. Building on internal technologies, CliqueMap serves multiple internal product areas and underlies several end-user-visible services. View details
    1RMA: Re-Envisioning Remote Memory Access for Multi-Tenant Datacenters
    Aditya Akella
    Arjun Singhvi
    Joel Scherpelz
    Monica C Wong-Chan
    Moray Mclaren
    Prashant Chandra
    Rob Cauble
    Sean Clark
    Simon Sabato
    Thomas F. Wenisch
    Proceedings of the Annual Conference of the ACM Special Interest Group on Data Communication on the Applications, Technologies, Architectures, and Protocols for Computer Communication, Association for Computing Machinery, New York, NY, USA (2020), 708–721
    Preview abstract Remote Direct Memory Access (RDMA) plays a key role in supporting performance-hungry datacenter applications. However, existing RDMA technologies are ill-suited to multi-tenant datacenters, where applications run at massive scales, tenants require isolation and security, and the workload mix changes over time. Our experiences seeking to operationalize RDMA at scale indicate that these ills are rooted in standard RDMA's basic design attributes: connection-orientedness and complex policies baked into hardware. We describe a new approach to remote memory access -- One-Shot RMA (1RMA) -- suited to the constraints imposed by our multi-tenant datacenter settings. The 1RMA NIC is connection-free and fixed-function; it treats each RMA operation independently, assisting software by offering fine-grained delay measurements and fast failure notifications. 1RMA software provides operation pacing, congestion control, failure recovery, and inter-operation ordering, when needed. The NIC, deployed in our production datacenters, supports encryption at line rate (100Gbps and 100M ops/sec) with minimal performance/availability disruption for encryption key rotation. View details
    Achieving Predictable Performance through Better Memory Controller Placement in Many-Core CMPs
    Dennis Abts
    Natalie Engright Jerger
    John Kim
    Mikko Lipasti
    Proceedings of the International Symposium on Computer Architecture, ACM (2009)
    Preview abstract In the near term, Moore's law will continue to provide an increasing number of transistors and therefore an increasing number of on-chip cores. Limited pin bandwidth prevents the integration of a large number of memory controllers on-chip. With many cores, and few memory controllers, where to locate the memory controllers in the on-chip interconnection fabric becomes an important and as yet unexplored question. In this paper, we show how the location of the memory controllers can reduce contention (hot spots) in the on-chip fabric, as well as lower the variance in reference latency which provides for predictable performance of memory-intensive applications regardless of the processing core on which a thread is scheduled. We explore the design space of on-chip fabrics to find optimal memory controller placement relative to different topologies (i.e. mesh and torus), routing algorithms, and workloads. View details
    No Results Found