Google Research

Reliable Data Processing with Minimal Toil

(2021)

Abstract

This paper discusses an approach for making data pipelines both safer and less manual. We detail how we applied well known reliability best practices from user-facing services to batch jobs that underpin many of the services that make up Google Workspace. Using validation steps, canarying, and target populations for data pipelines, we ensure that only stable versions are promoted to the next environment stage. By moving to a single, standardized platform we minimized duplicate effort across services. We also touch on how we optimized batch jobs for both correctness and freshness SLOs, and the benefits of batch jobs vs. async event-based processing.

Research Areas

Learn more about how we do research

We maintain a portfolio of research projects, providing individuals and teams the freedom to emphasize specific types of work