Himani Apte
Research Areas
Authored Publications
Sort By
F1 Query: Declarative Querying at Scale
Bart Samwel
Ben Handy
Jason Govig
Chanjun Yang
Daniel Tenedorio
Felix Weigel
David G Wilhite
Jiacheng Yang
Jun Xu
Jiexing Li
Zhan Yuan
Qiang Zeng
Ian Rae
Anurag Biyani
Andrew Harn
Yang Xia
Andrey Gubichev
Amr El-Helw
Orri Erling
Allen Yan
Mohan Yang
Yiqun Wei
Thanh Do
Colin Zheng
Somayeh Sardashti
Ahmed Aly
Divy Agrawal
Shivakumar Venkataraman
PVLDB (2018), pp. 1835-1848
Preview abstract
F1 Query is a stand-alone, federated query processing platform that executes SQL queries against data stored in different file-based formats as well as different storage systems (e.g., BigTable, Spanner, Google Spreadsheets, etc.). F1 Query eliminates the need to maintain the traditional distinction between different types of data processing workloads by simultaneously supporting: (i) OLTP-style point queries that affect only a few records; (ii) low-latency OLAP querying of large amounts of data; and (iii) large ETL pipelines transforming data from multiple data sources into formats more suitable for analysis and reporting. F1 Query has also significantly reduced the need for developing hard-coded data processing pipelines by enabling declarative queries integrated with custom business logic. F1 Query satisfies key requirements that are highly desirable within Google: (i) it provides a unified view over data that is fragmented and distributed over multiple data sources; (ii) it leverages datacenter resources for performant query processing with high throughput and low latency; (iii) it provides high scalability for large data sizes by increasing computational parallelism; and (iv) it is extensible and uses innovative approaches to integrate complex business logic in declarative query processing. This paper presents the end-to-end design of F1 Query. Evolved out of F1, the distributed database that Google uses to manage its advertising data, F1 Query has been in production for multiple years at Google and serves the querying needs of a large number of users and systems.
View details
Shasta: Interactive Reporting at Scale
Stephan Ellner
Stephan Gudmundson
Apurv Gupta
Ben Handy
Bart Samwel
Chad Whipkey
Larysa Aharkava
Jun Xu
Shivakumar Venkataraman
Divy Agrawal
Jeffrey D. Ullman
SIGMOD, San Francisco, CA (2016) (to appear)
Preview abstract
We describe Shasta, a middleware system built at Google to support interactive reporting in complex user-facing applications related to Google’s Internet advertising business. Shasta targets applications with challenging requirements: First, user query latencies must be low. Second, underlying transactional data stores have complex “read-unfriendly” schemas, placing significant transformation logic between stored data and the read-only views that Shasta exposes to its clients. This transformation logic must be expressed in a way that scales to large and agile engineering teams. Finally, Shasta targets applications with strong data freshness requirements, making it challenging to precompute query results using common techniques such as ETL pipelines or materialized views. Instead, online queries must go all the way from primary storage to userfacing views, resulting in complex queries joining 50 or more tables.
Designed as a layer on top of Google’s F1 RDBMS and Mesa data warehouse, Shasta combines language and system techniques to meet these requirements. To help with expressing complex view specifications, we developed a query language called RVL, with support for modularized view templates that can be dynamically compiled into SQL. To execute these SQL queries with low latency at scale, we leveraged and extended F1’s distributed query engine with facilities such as safe execution of C++ and Java UDFs. To reduce latency and increase read parallelism, we extended F1 storage with a distributed read-only in-memory cache. The system we describe is in production at Google, powering critical applications used by advertisers and internal sales teams. Shasta has significantly improved system scalability and software engineering efficiency compared to the middleware solutions it replaced.
View details
F1: A Distributed SQL Database That Scales
Bart Samwel
Ben Handy
Chad Whipkey
Mircea Oancea
Kyle Littlefield
David Menestrina
Stephan Ellner
Ian Rae
Traian Stancescu
VLDB (2013)
Preview abstract
F1 is a distributed relational database system built at
Google to support the AdWords business. F1 is a hybrid
database that combines high availability, the scalability of
NoSQL systems like Bigtable, and the consistency and usability of traditional SQL databases. F1 is built on Spanner, which provides synchronous cross-datacenter replication and strong consistency. Synchronous replication implies higher commit latency, but we mitigate that latency
by using a hierarchical schema model with structured data
types and through smart application design. F1 also includes a fully functional distributed SQL query engine and
automatic change tracking and publishing.
View details