Napa: Powering Scalable  Data Warehousing with Robust Query Performance at Google

Ankur Agiwal; Kevin Lai; Gokul Nath Babu Manoharan; Indrajit Roy; Jagan Sankaranarayanan; Hao Zhang; Tao Zou; Min Chen; Jim Chen; Ming Dai; Thanh Do; Haoyu Gao; Haoyan Geng; Raman Grover; Bo Huang; Yanlai Huang; Adam Li; Jianyi Liang; Tao Lin; Li Liu; Yao Liu; Xi Mao; Maya Meng; Prashant Mishra; Jay Patel; Rajesh S R; Vijayshankar Raman; Sourashis Roy; Mayank Singh Shishodia; Tianhang Sun; Justin Tang; Junichi Tatemura; Sagar Trehan; Ramkumar Vadali; Prasanna Venkatasubramanian; Joey Zhang; Kefei Zhang; Yupu Zhang; Zeleng Zhuang; Goetz Graefe; Divyakanth Agrawal; Jeff Naughton; Sujata Sunil Kosalge; Hakan Hacıgümüş

Napa: Powering Scalable Data Warehousing with Robust Query Performance at Google

Ankur Agiwal

Kevin Lai

Gokul Nath Babu Manoharan

Indrajit Roy

Jagan Sankaranarayanan

Hao Zhang

Tao Zou

Min Chen

Jim Chen

Ming Dai

Thanh Do

Haoyu Gao

Haoyan Geng

Raman Grover

Bo Huang

Yanlai Huang

Adam Li

Jianyi Liang

Tao Lin

Li Liu

Yao Liu

Xi Mao

Maya Meng

Prashant Mishra

Jay Patel

Rajesh S R

Vijayshankar Raman

Sourashis Roy

Mayank Singh Shishodia

Tianhang Sun

Justin Tang

Junichi Tatemura

Sagar Trehan

Ramkumar Vadali

Prasanna Venkatasubramanian

Joey Zhang

Kefei Zhang

Yupu Zhang

Zeleng Zhuang

Goetz Graefe

Divyakanth Agrawal

Jeff Naughton

Sujata Sunil Kosalge

Hakan Hacıgümüş

Proceedings of the VLDB Endowment (PVLDB), 14 (12) (2021), pp. 2986-2998

Download Google Scholar

Abstract

There are numerous Google services that continuously generate vast amounts of log data that are used to provide valuable insights to internal and external business users. We need to store and serve these planet-scale data sets under extremely demanding requirements of scalability, sub-second query response times, availability even in the case of entire data center failures, strong consistency guarantees, ingesting a massive stream of updates coming from the applications used around the globe. We have developed and deployed in production an analytical data management system, called Napa, to meet these requirements. Napa is the backend for multiple internal and external clients in Google so there is a strong expectation of variance-free robust query performance. At its core, Napa’s principal technologies for robust query performance include the aggressive use of materialized views that are maintained consistently as new data is ingested across multiple data centers. Our clients also demand flexibility in being able to adjust their query performance, data freshness, and costs to suit their unique needs. Robust query processing and flexible configuration of client databases are the hallmark of Napa design. Most of the related work in this area takes advantage of full flexibility to design the whole system without the need to support a diverse set of preexisting use cases, whereas Napa needs to deal with the hard constraints of applications that differ on which characteristics of the system are most important to optimize. Those constraints led us to make particular design decisions and also devise new techniques to meet the challenges. In this paper, we share our experiences in designing, implementing, deploying, and running Napa in production with some of Google’s most demanding applications.

Explore our many areas of focus

Building a collaborative ecosystem

Shaping the future together

Translating discovery into real-world impact

Napa: Powering Scalable Data Warehousing with Robust Query Performance at Google

Abstract

Meet the teams driving innovation

Google AI

Google Cloud

Google DeepMind

Google Labs