MillWheel: Fault-Tolerant Stream Processing at Internet Scale

Tyler Akidau; Alex Balikov; Kaya Bekiroglu; Slava Chernyak; Josh Haberman; Reuven Lax; Sam McVeety; Daniel Mills; Paul Nordstrom; Sam Whittle

MillWheel: Fault-Tolerant Stream Processing at Internet Scale

Tyler Akidau

Alex Balikov

Kaya Bekiroglu

Slava Chernyak

Josh Haberman

Reuven Lax

Sam McVeety

Daniel Mills

Paul Nordstrom

Sam Whittle

Very Large Data Bases (2013), pp. 734-746

Google Scholar

Abstract

MillWheel is a framework for building low-latency data-processing applications that is widely used at Google. Users specify a directed computation graph and application code for individual nodes, and the system manages persistent state and the continuous flow of records, all within the envelope of the framework's fault-tolerance guarantees. This paper describes MillWheel's programming model as well as its implementation. The case study of a continuous anomaly detector in use at Google serves to motivate how many of MillWheel's features are used. MillWheel's programming model provides a notion of logical time, making it simple to write time-based aggregations. MillWheel was designed from the outset with fault tolerance and scalability in mind. In practice, we find that MillWheel's unique combination of scalability, fault tolerance, and a versatile programming model lends itself to a wide variety of problems at Google.

Explore our many areas of focus

Building a collaborative ecosystem

Shaping the future together

Translating discovery into real-world impact

MillWheel: Fault-Tolerant Stream Processing at Internet Scale

Abstract

Meet the teams driving innovation

Google AI

Google Cloud

Google DeepMind

Google Labs