Availability in Globally Distributed Storage Systems

Daniel Ford; Francois Labelle; Florentina Popovici; Murray Stokely; Van-Anh Truong; Luiz Barroso; Carrie Grimes; Sean Quinlan

Availability in Globally Distributed Storage Systems

Daniel Ford

Francois Labelle

Florentina Popovici

Murray Stokely

Van-Anh Truong

Luiz Barroso

Carrie Grimes

Sean Quinlan

Proceedings of the 9th USENIX Symposium on Operating Systems Design and Implementation, USENIX (2010)

Google Scholar

Abstract

Highly available cloud storage is often implemented with
complex, multi-tiered distributed systems built on top of clusters of
commodity servers and disk drives. Sophisticated management, load
balancing and recovery techniques are needed to achieve high
performance and availability amidst an abundance of failure sources
that include software, hardware, network connectivity, and power issues. While
there is a relative wealth of failure studies of individual components of
storage systems, such as disk drives, relatively little has been
reported so far on the overall availability behavior of large
cloud-based storage services.

We characterize the availability properties of cloud
storage systems based on an extensive one year study of Google's
main storage infrastructure and present statistical models
that enable further insight into the impact of multiple
design choices, such as data placement and replication strategies.
With these models we compare data availability under a variety of
system parameters given the real patterns of failures observed in our fleet.

Explore our many areas of focus

Building a collaborative ecosystem

Shaping the future together

Translating discovery into real-world impact

Availability in Globally Distributed Storage Systems

Abstract

Meet the teams driving innovation

Google AI

Google Cloud

Google DeepMind

Google Labs