MillWheel: Fault-Tolerant Stream Processing at Internet Scale

Tyler Akidau
Alex Balikov
Kaya Bekiroglu
Slava Chernyak
Josh Haberman
Reuven Lax
Sam McVeety
Daniel Mills
Paul Nordstrom
Sam Whittle
Very Large Data Bases (2013), pp. 734-746
Google Scholar

Abstract

MillWheel is a framework for building low-latency data-processing applications that is widely used at Google. Users specify a directed computation graph and application code for individual nodes, and the system manages persistent state and the continuous flow of records, all within the envelope of the framework's fault-tolerance guarantees. This paper describes MillWheel's programming model as well as its implementation. The case study of a continuous anomaly detector in use at Google serves to motivate how many of MillWheel's features are used. MillWheel's programming model provides a notion of logical time, making it simple to write time-based aggregations. MillWheel was designed from the outset with fault tolerance and scalability in mind. In practice, we find that MillWheel's unique combination of scalability, fault tolerance, and a versatile programming model lends itself to a wide variety of problems at Google.