
Ankur Agiwal
Research Areas
Authored Publications
Sort By
Napa: Powering Scalable Data Warehousing with Robust Query Performance at Google
Thanh Do
Indrajit Roy
Haoyu Gao
Tao Lin
Mayank Singh Shishodia
Jianyi Liang
Sujata Sunil Kosalge
Tianhang Sun
Jay Patel
Ming Dai
Junichi Tatemura
Raman Grover
Kevin Lai
Min Chen
Xi Mao
Jeff Naughton
Bo Huang
Yao Liu
Prasanna Venkatasubramanian
Prashant Mishra
Yanlai Huang
Ramkumar Vadali
Maya Meng
Divyakanth Agrawal
Kefei Zhang
Jim Chen
Justin Tang
Haoyan Geng
Li Liu
Vijayshankar Raman
Sagar Trehan
Sourashis Roy
Zeleng Zhuang
Joey Zhang
Adam Li
Yupu Zhang
Hakan Hacıgümüş
Proceedings of the VLDB Endowment (PVLDB), 14 (12) (2021), pp. 2986-2998
Preview abstract
There are numerous Google services that continuously generate vast amounts of log data that are used to provide valuable insights to internal and external business users. We need to store and serve these planet-scale data sets under extremely demanding requirements of scalability, sub-second query response times, availability even in the case of entire data center failures, strong consistency guarantees, ingesting a massive stream of updates coming from the applications used around the globe. We have developed and deployed in production an analytical data management system, called Napa, to meet these requirements. Napa is the backend for multiple internal and external clients in Google so there is a strong expectation of variance-free robust query performance. At its core, Napa’s principal technologies for robust query performance include the aggressive use of materialized views that are maintained consistently as new data is ingested across multiple data centers. Our clients also demand flexibility in being able to adjust their query performance, data freshness, and costs to suit their unique needs. Robust query processing and flexible configuration of client databases are the hallmark of Napa design. Most of the related work in this area takes advantage of full flexibility to design the whole system without the need to support a diverse set of preexisting use cases, whereas Napa needs to deal with the hard constraints of applications that differ on which characteristics of the system are most important to optimize. Those constraints led us to make particular design decisions and also devise new techniques to meet the challenges. In this paper, we share our experiences in designing, implementing, deploying, and running Napa in production with some of Google’s most demanding applications.
View details
Mesa: Geo-Replicated, Near Real-Time, Scalable Data Warehousing
Shuo Wu
Fan Yang
Sandeep Dhoot
Adam Kirsch
David Jones
Jason Govig
Kevin Lai
Masood Siddiqi
Jamie Cameron
Kelvin Chan
Divyakant Agrawal
Abhilash Kumar
Mingsheng Hong
Andrey Gubarev
Shivakumar Venkataraman
VLDB (2014)
Preview abstract
Mesa is a highly scalable analytic data warehousing system that stores critical measurement data related to Google's Internet advertising business. Mesa is designed to satisfy a complex and challenging set of user and systems requirements, including near real-time data ingestion and queryability, as well as high availability, reliability, fault tolerance, and scalability for large data and query volumes. Specifically, Mesa handles petabytes of data, processes millions of row updates per second, and serves billions of queries that fetch trillions of rows per day. Mesa is geo-replicated across multiple datacenters and provides consistent and repeatable query answers at low latency, even when an entire datacenter fails. This paper presents the Mesa system and reports the performance and scale that it achieves.
View details