Jagan Sankaranarayanan
            I work at Google on their data infrastructure team. In my previous life, I was the department head of the data management group at NEC Labs America. I got my doctoral degree in Computer Science from the University of Maryland in 2008 where my doctoral advisor was Prof. Hanan Samet.
          
        
        Research Areas
      Authored Publications
    
  
  
  
    
    
  
      
        Sort By
        
        
    
    
        
        
          
              Preview abstract
          
          
              In the new “gig” economy, a user plays the role of a consumer as well as a service provider. As a service provider, drivers travelling from a source to a destination may opportunistically pickup and drop-off packages along the way if that does not add significantly to their trip distance or time. This gives rise to a new business offering called Package Delivery as a Service (PDaaS) that brokers package pickups and deliveries at one end and connects them to drivers on the other end, thus creating an ecosystem of supply and demand. The dramatic cost savings of such a service model come from the fact that the driver is already en-route to their destination and the package delivery adds a small overhead to an already pre-planned trip. From a technical perspective, this problem introduces new technical challenges that are uncommon in the literature. The driver may want to optimise for distance or time. Furthermore, new packages arrive for delivery all the time and are assigned to various drivers continuously. This means that the algorithm has to work in an environment that is dynamic, thereby precluding most standard road network precomputation efforts. Furthermore, the number of packages that are available for delivery could be in the hundreds of thousands, which has to be quickly pruned down for the algorithm to scale. The paper proposes a variation called dual Dijkstra’s that combines a forward and a backward scan in order to find delivery options that satisfy the constraints specified by the driver. The new dual heuristic improves the standard single Dijkstra’s approach by narrowing down the search space, thus resulting in significant speed-ups over the standard algorithms. Furthermore, a combination of dual Dijkstra’s with a heuristic landmark approach results in a dramatic speed-up compared to the baseline algorithms. Experimental results show that the proposed approach can offer drivers a choice of packages to deliver under specified constraints of time or distance, and providing sub-second response time despite the complexity of the problem involved. As the number of packages in the system increases, the matchmaking process becomes easier resulting in faster response times. The scalability of the PDaaS infrastructure is demonstrated using extensive experimental results.
              
  
View details
          
        
      
    
        
          
            
              In-path Oracles for Road Networks
            
          
        
        
          
            
              
                
                  
                    
    
    
    
    
    
                      
                        Debajyoti Ghosh
                      
                    
                
              
            
              
                
                  
                    
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Kiran Khatter
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Hanan Samet
                      
                    
                  
              
            
          
          
          
          
            International Journal of Geo-Information, 12(7) (2023), pp. 277
          
          
        
        
        
          
              Preview abstract
          
          
              Many spatial applications benefit from the fast answering of a seemingly simple spatial query --- ``Is a point of interest (POI) `in-path’ to the shortest path between a source and a destination?’’ In-path in this case refers to POI that are either on the shortest path or can be reached within a bounded yet small detour of the shortest path.  The fast answering of the in-path queries is contingent on being able to determine if a POI is in-path or not without having to compute the shortest paths during run-time. Thus, this requires a precomputation solution.
The key technical solution is the development of an in-path oracle that is based on precomputation which records pairs of sources and destinations that are in-path with respect to the given POI location. For a given road network with $n$ nodes and $m$ POIs, a $O(m \times n)$-sized oracle is envisioned based on the reduction of Well-Separated pair decomposition of the road network. Furthermore, the oracle can be indexed in a database using a B-tree and hundreds of thousands of in-path queries per second can be answered. Experimental results on real road network POI dataset  showcase the superiority of this technique compared to a suitable baseline. The proposed approach can answer 1.5 million in-path queries/second compared to the few hundreds per second with existing approaches.
              
  
View details
          
        
      
    
        
          
            
              Cross-lingual text clustering in a large system
            
          
        
        
          
            
              
                
                  
                    
    
    
    
    
    
                      
                        Nicole R. Schneider
                      
                    
                
              
            
              
                
                  
                    
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Hanan Samet
                      
                    
                  
              
            
          
          
          
          
            2023 7th International Conference on Natural Language Processing and Information Retrieval (NLPIR 2023)  (2023) (to appear)
          
          
        
        
        
          
              Preview abstract
          
          
              The multilingual world needs systems that can cluster text written
in multiple languages into the same thread or topic. A practical
approach for clustering text in different languages is to first translate
into a common language, such as English, and then cluster it post-
translation. While this approach seems sensible, the performance
and pitfalls of this approach have not been well studied. The
reference architecture used for the study is a news system that
has continuously indexed news articles over many years in over
19 languages. Through the analysis of these documents and their
clusters, the clustering quality is shown to be dependent on the
translator’s ability to normalize proper noun usage, the geographic
focus of the text, and the document topic.
              
  
View details
          
        
      
    
        
          
            
              Progressive Partitioning for Parallelized Query Execution in  Google’s Napa
            
          
        
        
          
            
              
                
                  
                    
    
    
    
    
    
                      
                        Junichi Tatemura
                      
                    
                
              
            
              
                
                  
                    
                    
                  
              
            
              
                
                  
                    
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Yanlai Huang
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Jim Chen
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Yupu Zhang
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Kevin Lai
                      
                    
                  
              
            
              
                
                  
                    
                    
                  
              
            
              
                
                  
                    
                    
                  
              
            
              
                
                  
                    
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Divyakant Agrawal
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Brad Adelberg
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Shilpa Kolhar
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Indrajit Roy
                      
                    
                  
              
            
          
          
          
          
            49th International Conference on Very Large Data Bases, VLDB (2023), pp. 3475-3487
          
          
        
        
        
          
              Preview abstract
          
          
              Napa powers Google's critical data warehouse needs. It utilizes Log-Structured Merge Tree (LSM) for real-time data ingestion and achieves sub-second query latency for billions of queries per day. Napa handles a wide variety of query workloads: from full-table scans, to range scans, and multi-key lookups. Our design challenge is to handle this diverse query workload that runs concurrently. In particular, a large percentage of our query volume consists of external reporting queries characterized by multi-key lookups with strict sub-second query latency targets.  
Query parallelization, which is achieved by processing a query in parallel by partitioning the input data (i.e., the SIMD model of computation), is an important technique to meet the low latency targets. Traditionally, the effectiveness of parallelization of a query is highly dependent on the alignment with the data partitioning established at write time. Unfortunately, such a write-time partitioning scheme cannot handle the highly variable parallelization requirements that are needed on a per-query basis. 
The key to Napa’s success is its ability to adapt its query parallelization requirements on a per-query basis. This paper describes an index-based approach to perform data partitioning for queries that have sub-second latency requirements. Napa’s  approach is progressive in that it can provide good partitioning within the time budgeted for partitioning. Since the end-to-end query time also includes the time to perform partitioning there is a tradeoff in terms of the time spent for partitioning and the resulting evenness of the partitioning. Our approach balances these opposing considerations to provide sub-second querying for billions of queries each day. We use production data to  establish the effectiveness of Napa’s approach across easy to handle workloads  to the most pathological conditions.
              
  
View details
          
        
      
    
        
          
            
              Napa: Powering Scalable  Data Warehousing with Robust Query Performance at Google
            
          
        
        
          
            
              
                
                  
                    
                
              
            
              
                
                  
                    
                    
    
    
    
    
    
                      
                        Kevin Lai
                      
                    
                  
              
            
              
                
                  
                    
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Indrajit Roy
                      
                    
                  
              
            
              
                
                  
                    
                    
                  
              
            
              
                
                  
                    
                    
                  
              
            
              
                
                  
                    
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Min Chen
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Jim Chen
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Ming Dai
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Thanh Do
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Haoyu Gao
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Haoyan Geng
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Raman Grover
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Bo Huang
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Yanlai Huang
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Adam Li
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Jianyi Liang
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Tao Lin
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Li Liu
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Yao Liu
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Xi Mao
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Maya Meng
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Prashant Mishra
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Jay Patel
                      
                    
                  
              
            
              
                
                  
                    
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Vijayshankar Raman
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Sourashis Roy
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Mayank Singh Shishodia
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Tianhang Sun
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Justin Tang
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Junichi Tatemura
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Sagar Trehan
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Ramkumar Vadali
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Prasanna Venkatasubramanian
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Joey Zhang
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Kefei Zhang
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Yupu Zhang
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Zeleng Zhuang
                      
                    
                  
              
            
              
                
                  
                    
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Divyakanth Agrawal
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Jeff Naughton
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Sujata Sunil Kosalge
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Hakan Hacıgümüş
                      
                    
                  
              
            
          
          
          
          
            Proceedings of the VLDB Endowment (PVLDB), 14 (12) (2021), pp. 2986-2998
          
          
        
        
        
          
              Preview abstract
          
          
              There are numerous Google services that continuously generate vast amounts of log data that are used to provide valuable insights to internal and external business users. We need to store and serve these planet-scale data sets under extremely demanding requirements of scalability, sub-second query response times, availability even in the case of entire data center failures, strong consistency guarantees, ingesting a massive stream of updates coming from the applications used around the globe. We have developed and deployed in production an analytical data management system, called Napa, to meet these requirements. Napa is the backend for multiple internal and external clients in Google so there is a strong expectation of variance-free robust query performance. At its core, Napa’s principal technologies for robust query performance include the aggressive use of materialized views that are maintained consistently as new data is ingested across multiple data centers. Our clients also demand flexibility in being able to adjust their query performance, data freshness, and costs to suit their unique needs. Robust query processing and flexible configuration of client databases are the hallmark of Napa design. Most of the related work in this area takes advantage of full flexibility to design the whole system without the need to support a diverse set of preexisting use cases, whereas Napa needs to deal with the hard constraints of applications that differ on which characteristics of the system are most important to optimize. Those constraints led us to make particular design decisions and also devise new techniques to meet the challenges. In this paper, we share our experiences in designing, implementing, deploying, and running Napa in production with some of Google’s most demanding applications.
              
  
View details