Hao Zhang
Research Areas
      Authored Publications
    
  
  
  
    
    
  
      
        Sort By
        
        
    
    
        
          
            
              Progressive Partitioning for Parallelized Query Execution in  Google’s Napa
            
          
        
        
          
            
              
                
                  
                    
    
    
    
    
    
                      
                        Junichi Tatemura
                      
                    
                
              
            
              
                
                  
                    
                    
                  
              
            
              
                
                  
                    
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Yanlai Huang
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Jim Chen
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Yupu Zhang
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Kevin Lai
                      
                    
                  
              
            
              
                
                  
                    
                    
                  
              
            
              
                
                  
                    
                    
                  
              
            
              
                
                  
                    
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Divyakant Agrawal
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Brad Adelberg
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Shilpa Kolhar
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Indrajit Roy
                      
                    
                  
              
            
          
          
          
          
            49th International Conference on Very Large Data Bases, VLDB (2023), pp. 3475-3487
          
          
        
        
        
          
              Preview abstract
          
          
              Napa powers Google's critical data warehouse needs. It utilizes Log-Structured Merge Tree (LSM) for real-time data ingestion and achieves sub-second query latency for billions of queries per day. Napa handles a wide variety of query workloads: from full-table scans, to range scans, and multi-key lookups. Our design challenge is to handle this diverse query workload that runs concurrently. In particular, a large percentage of our query volume consists of external reporting queries characterized by multi-key lookups with strict sub-second query latency targets.  
Query parallelization, which is achieved by processing a query in parallel by partitioning the input data (i.e., the SIMD model of computation), is an important technique to meet the low latency targets. Traditionally, the effectiveness of parallelization of a query is highly dependent on the alignment with the data partitioning established at write time. Unfortunately, such a write-time partitioning scheme cannot handle the highly variable parallelization requirements that are needed on a per-query basis. 
The key to Napa’s success is its ability to adapt its query parallelization requirements on a per-query basis. This paper describes an index-based approach to perform data partitioning for queries that have sub-second latency requirements. Napa’s  approach is progressive in that it can provide good partitioning within the time budgeted for partitioning. Since the end-to-end query time also includes the time to perform partitioning there is a tradeoff in terms of the time spent for partitioning and the resulting evenness of the partitioning. Our approach balances these opposing considerations to provide sub-second querying for billions of queries each day. We use production data to  establish the effectiveness of Napa’s approach across easy to handle workloads  to the most pathological conditions.
              
  
View details
          
        
      
    
        
          
            
              Napa: Powering Scalable  Data Warehousing with Robust Query Performance at Google
            
          
        
        
          
            
              
                
                  
                    
                
              
            
              
                
                  
                    
                    
    
    
    
    
    
                      
                        Kevin Lai
                      
                    
                  
              
            
              
                
                  
                    
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Indrajit Roy
                      
                    
                  
              
            
              
                
                  
                    
                    
                  
              
            
              
                
                  
                    
                    
                  
              
            
              
                
                  
                    
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Min Chen
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Jim Chen
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Ming Dai
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Thanh Do
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Haoyu Gao
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Haoyan Geng
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Raman Grover
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Bo Huang
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Yanlai Huang
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Adam Li
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Jianyi Liang
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Tao Lin
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Li Liu
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Yao Liu
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Xi Mao
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Maya Meng
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Prashant Mishra
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Jay Patel
                      
                    
                  
              
            
              
                
                  
                    
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Vijayshankar Raman
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Sourashis Roy
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Mayank Singh Shishodia
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Tianhang Sun
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Justin Tang
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Junichi Tatemura
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Sagar Trehan
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Ramkumar Vadali
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Prasanna Venkatasubramanian
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Joey Zhang
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Kefei Zhang
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Yupu Zhang
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Zeleng Zhuang
                      
                    
                  
              
            
              
                
                  
                    
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Divyakanth Agrawal
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Jeff Naughton
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Sujata Sunil Kosalge
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Hakan Hacıgümüş
                      
                    
                  
              
            
          
          
          
          
            Proceedings of the VLDB Endowment (PVLDB), 14 (12) (2021), pp. 2986-2998
          
          
        
        
        
          
              Preview abstract
          
          
              There are numerous Google services that continuously generate vast amounts of log data that are used to provide valuable insights to internal and external business users. We need to store and serve these planet-scale data sets under extremely demanding requirements of scalability, sub-second query response times, availability even in the case of entire data center failures, strong consistency guarantees, ingesting a massive stream of updates coming from the applications used around the globe. We have developed and deployed in production an analytical data management system, called Napa, to meet these requirements. Napa is the backend for multiple internal and external clients in Google so there is a strong expectation of variance-free robust query performance. At its core, Napa’s principal technologies for robust query performance include the aggressive use of materialized views that are maintained consistently as new data is ingested across multiple data centers. Our clients also demand flexibility in being able to adjust their query performance, data freshness, and costs to suit their unique needs. Robust query processing and flexible configuration of client databases are the hallmark of Napa design. Most of the related work in this area takes advantage of full flexibility to design the whole system without the need to support a diverse set of preexisting use cases, whereas Napa needs to deal with the hard constraints of applications that differ on which characteristics of the system are most important to optimize. Those constraints led us to make particular design decisions and also devise new techniques to meet the challenges. In this paper, we share our experiences in designing, implementing, deploying, and running Napa in production with some of Google’s most demanding applications.
              
  
View details