Anoop Johnson

Anoop Johnson

Anoop Johnson is a Principal Engineer at Google working on BigQuery. At Google, Anoop is a technical lead of BigQuery storage, which spans data lake and BigQuery native storage. Prior to Google, Anoop was a Principal Engineer at Amazon Web Services (AWS), where he worked on Amazon EMR and Athena and prior to that at A9, working on search for Amazon.com.
Authored Publications
Sort By
  • Title
  • Title, descending
  • Year
  • Year, descending
    BigLake: BigQuery’s Evolution toward a Multi-Cloud Lakehouse
    Justin Levandoski
    Garrett Casto
    Mingge Deng
    Rushabh Desai
    Thibaud Hottelier
    Amir Hormati
    Jeff Johnson
    Dawid Kurzyniec
    Prem Ramanathan
    Gaurav Saxena
    Vidya Shanmugam
    Yuri Volobuev
    SIGMOD (2024)
    Preview abstract BigQuery’s cloud-native disaggregated architecture has allowed Google Cloud to evolve the system to meet several customer needs across the analytics and AI/ML workload spectrum. A key customer requirement for BigQuery centers around the unification of data lake and enterprise data warehousing workloads. This approach combines: (1) the need for core data management primitives, e.g., security, governance, common runtime metadata, performance acceleration, ACID transactions, provided by an enterprise data warehouses coupled with (2) harnessing the flexibility of the open source format and analytics ecosystem along with new workload types such as AI/ML over unstructured data on object storage. In addition, there is a strong requirement to support BigQuery as a multi-cloud offering given cloud customers are opting for a multi-cloud footprint by default. This paper describes BigLake, an evolution of BigQuery toward a multi-cloud lakehouse to address these customer requirements in novel ways. We describe three main innovations in this space. We first present BigLake tables, making open-source table formats (e.g., Apache Parquet, Iceberg) first class citizens, providing fine-grained governance enforcement and performance acceleration over these formats to BigQuery and other open-source analytics engines. Next, we cover the design and implementation of BigLake Object tables that allow BigQuery to integrate AI/ML for inferencing and processing over unstructured data. Finally, we present Omni, a platform for deploying BigQuery on non-GCP clouds, focusing on the infrastructure and operational innovations we made to provide an enterprise lakehouse product regardless of the cloud provider hosting the data. View details