Sean Quinlan
Research Areas
Authored Publications
Sort By
Spanner: Google's Globally Distributed Database
Preview
Michael Epstein
Andrew Fikes
Christopher Frost
J. J. Furman
Andrey Gubarev
Christopher Heiser
Sebastian Kanthak
Eugene Kogan
Hongyi Li
Sergey Melnik
David Mwaura
David Nagle
Rajesh Rao
Lindsay Rolig
Yasushi Saito
Michal Szymaniak
Christopher Taylor
Ruth Wang
Dale Woodford
ACM Trans. Comput. Syst., 31 (2013), pp. 8
Spanner: Google's Globally-Distributed Database
Michael Epstein
Andrew Fikes
Christopher Frost
JJ Furman
Andrey Gubarev
Christopher Heiser
Peter Hochschild
Sebastian Kanthak
Eugene Kogan
Hongyi Li
Sergey Melnik
David Mwaura
David Nagle
Rajesh Rao
Lindsay Rolig
Dale Woodford
Yasushi Saito
Christopher Taylor
Michal Szymaniak
Ruth Wang
OSDI (2012)
Preview abstract
Spanner is Google's scalable, multi-version, globally-distributed, and synchronously-replicated database. It is the first system to distribute data at global scale and support externally-consistent distributed transactions. This paper describes how Spanner is structured, its feature set, the rationale underlying various design decisions, and a novel time API that exposes clock uncertainty. This API and its implementation are critical to supporting external consistency and a variety of powerful features: non-blocking reads in the past, lock-free read-only transactions, and atomic schema changes, across all of Spanner.
View details
Availability in Globally Distributed Storage Systems
Daniel Ford
Francois Labelle
Florentina Popovici
Murray Stokely
Van-Anh Truong
Luiz Barroso
Proceedings of the 9th USENIX Symposium on Operating Systems Design and Implementation, USENIX (2010)
Preview abstract
Highly available cloud storage is often implemented with
complex, multi-tiered distributed systems built on top of clusters of
commodity servers and disk drives. Sophisticated management, load
balancing and recovery techniques are needed to achieve high
performance and availability amidst an abundance of failure sources
that include software, hardware, network connectivity, and power issues. While
there is a relative wealth of failure studies of individual components of
storage systems, such as disk drives, relatively little has been
reported so far on the overall availability behavior of large
cloud-based storage services.
We characterize the availability properties of cloud
storage systems based on an extensive one year study of Google's
main storage infrastructure and present statistical models
that enable further insight into the impact of multiple
design choices, such as data placement and replication strategies.
With these models we compare data availability under a variety of
system parameters given the real patterns of failures observed in our fleet.
View details
Interpreting the Data: Parallel Analysis with Sawzall
Rob Pike
Scientific Programming Journal, 13 (2005), pp. 277-298
Preview abstract
Very large data sets often have a flat but regular structure and span multiple disks and machines. Examples include telephone call records, network logs, and web document repositories. These large data sets are not amenable to study using traditional database techniques, if only because they can be too large to fit in a single relational database. On the other hand, many of the analyses done on them can be expressed using simple, easily distributed computations: filtering, aggregation, extraction of statistics, and so on.
We present a system for automating such analyses. A filtering phase, in which a query is expressed using a new programming language, emits data to an aggregation phase. Both phases are distributed over hundreds or even thousands of computers. The results are then collated and saved to a file. The design -- including the separation into two phases, the form of the programming language, and the properties of the aggregators -- exploits the parallelism inherent in having data and computation distributed across many machines.
Animation: The paper references this movie showing how the distribution of requests to google.com around the world changed through the day on August 14, 2003.
View details