Evolve or Die: High-Availability Design Principles Drawn from Google's Network Infrastructure

Ramesh Govindan; Ina Minei; Mahesh Kallahalla; Bikash Koley; Amin Vahdat

Evolve or Die: High-Availability Design Principles Drawn from Google's Network Infrastructure

Ramesh Govindan

Ina Minei

Mahesh Kallahalla

Bikash Koley

Amin Vahdat

ACM SIGCOMM (2016)

Download Google Scholar

Abstract

Maintaining the highest levels of availability for content providers is challenging in the face of scale, network evolution, and complexity. Little, however, is known about the network failures large content providers are susceptible to, and what mechanisms they employ to ensure high availability. From a detailed analysis of over 100 high-impact failure events within Google’s network, encompassing many
data centers and two WANs, we quantify several dimensions of availability failures. We find that failures are evenly distributed across different network types and across data, control, and management planes, but that a large number of failures happen when a network management operation is in progress within the network. We discuss some of these failures in detail, and also describe our design principles for high availability motivated by these failures. These include using defense in depth, maintaining consistency across planes, failing open on large failures, carefully preventing and avoiding failures, and assessing root cause quickly. Our findings suggest that, as networks become more complicated, failures lurk everywhere, and, counter-intuitively, continuous incremental evolution of the network can, when applied together with our design principles, result in a more robust network.

Research Areas

Networking

Explore our many areas of focus

Building a collaborative ecosystem

Shaping the future together

Translating discovery into real-world impact

Evolve or Die: High-Availability Design Principles Drawn from Google's Network Infrastructure

Abstract

Research Areas

Meet the teams driving innovation

Google AI

Google Cloud

Google DeepMind

Google Labs