Justin Levandoski
Justin Levandoski is a Director of Engineering at Google working on BigQuery, where he leads the Lake Analytics and Omni cross-cloud data warehousing infrastructure efforts. At Google, he founded the BigLake project to unify data lake and warehouse capabilities within BigQuery, and extend BigQuery’s reach to unstructured data through Object Tables to extend BigQuery’s AI/ML platform capabilities for customers. Prior to Google, Justin was a principal engineer at Amazon Web Services (AWS), where he worked on Amazon Aurora. Before that, he was a member of the database group at Microsoft Research, where he worked on main-memory databases, database support for new hardware platforms, transaction processing, and cloud computing. His research was commercialized in a number of Microsoft products, including the SQL Server Hekaton main-memory database engine, Azure CosmosDB, and Azure SQL Hyperscale. He continues to serve as program and organizing committees for top database conferences such as ACM SIGMOD, VLDB, ICDE, CIDR, and HPTS.
Research Areas
Authored Publications
Sort By
BigLake: BigQuery’s Evolution toward a Multi-Cloud Lakehouse
Garrett Casto
Mingge Deng
Rushabh Desai
Thibaud Hottelier
Amir Hormati
Jeff Johnson
Dawid Kurzyniec
Prem Ramanathan
Gaurav Saxena
Vidya Shanmugam
Yuri Volobuev
SIGMOD (2024)
Preview abstract
BigQuery’s cloud-native disaggregated architecture has allowed Google Cloud to evolve the system to meet several customer needs across the analytics and AI/ML workload spectrum. A key customer requirement for BigQuery centers around the unification of data lake and enterprise data warehousing workloads. This approach combines: (1) the need for core data management primitives, e.g., security, governance, common runtime metadata, performance acceleration, ACID transactions, provided by an enterprise data warehouses coupled with (2) harnessing the flexibility of the open source format and analytics ecosystem along with new workload types such as AI/ML over unstructured data on object storage. In addition, there is a strong requirement to support BigQuery as a multi-cloud offering given cloud customers are opting for a multi-cloud footprint by default.
This paper describes BigLake, an evolution of BigQuery toward a multi-cloud lakehouse to address these customer requirements in novel ways. We describe three main innovations in this space. We first present BigLake tables, making open-source table formats (e.g., Apache Parquet, Iceberg) first class citizens, providing fine-grained governance enforcement and performance acceleration over these formats to BigQuery and other open-source analytics engines. Next, we cover the design and implementation of BigLake Object tables that allow BigQuery to integrate AI/ML for inferencing and processing over unstructured data. Finally, we present Omni, a platform for deploying BigQuery on non-GCP clouds, focusing on the infrastructure and operational innovations we made to provide an enterprise lakehouse product regardless of the cloud provider hosting the data.
View details