Data Management

Google is deeply engaged in Data Management research across a variety of topics with deep connections to Google products. We are building intelligent systems to discover, annotate, and explore structured data from the Web, and to surface them creatively through Google products, such as Search (e.g., structured snippets, Docs, and many others). The overarching goal is to create a plethora of structured data on the Web that maximally help Google users consume, interact and explore information. Through those projects, we study various cutting-edge data management research issues including information extraction and integration, large scale data analysis, effective data exploration, etc., using a variety of techniques, such as information retrieval, data mining and machine learning.

A major research effort involves the management of structured data within the enterprise. The goal is to discover, index, monitor, and organize this type of data in order to make it easier to access high-quality datasets. This type of data carries different, and often richer, semantics than structured data on the Web, which in turn raises new opportunities and technical challenges in their management.

Furthermore, Data Management research across Google allows us to build technologies that power Google's largest businesses through scalable, reliable, fast, and general-purpose infrastructure for large-scale data processing as a service. Some examples of such technologies include F1, the database serving our ads infrastructure; Mesa, a petabyte-scale analytic data warehousing system; and Dremel, for petabyte-scale data processing with interactive response times. Dremel is available for external customers to use as part of Google Cloud’s BigQuery.

Recent Publications

Vortex: A Stream-oriented Storage Engine For Big Data Analytics

Pavan Edara

Jonathan Forbes

Bigang Li

SIGMOD(2024)

Discovering Datasets on the Web Scale: Challenges and Recommendations for Google Dataset Search

Katrina Sostek

Daniel Russell

Tesh Goyal

Tarfah Alrashed

Stella Dugall

Natasha Noy

Harvard Data Science Review(2024)

Automatic Histograms: Leveraging Language Models for Text Dataset Exploration

Emily Reif

Crystal Qian

James Wexler

Minsuk Kahng

Extended Abstracts of the CHI Conference on Human Factors in Computing Systems (CHI EA '24), ACM, Honolulu, HI, USA(2024), pp. 9

Chain-of-Table: Evolves Tables in the LLM Reasoning Chain for Table Understanding

Zilong Wang

Hao Zhang

Chun-Liang Li

Julian Eisenschlos

Vincent Perot

Zifeng Wang

Lesly Miculicich

Yasuhisa Fujii

Jingbo Shang

Chen-Yu Lee

Tomas Pfister

ICLR(2024)

BigLake: BigQuery’s Evolution toward a Multi-Cloud Lakehouse

Justin Levandoski

Garrett Casto

Mingge Deng

Rushabh Desai

Pavan Edara

Thibaud Hottelier

Amir Hormati

Anoop Johnson

Jeff Johnson

Dawid Kurzyniec

Sam McVeety

Prem Ramanathan

Gaurav Saxena

Vidya Shanmugam

Yuri Volobuev

SIGMOD(2024)

Are we cobblers without shoes? Making Computer Science data FAIR

Natasha Noy

Carole Goble

Communications of ACM, 66 (1)(2023)

Defining the technology of today and tomorrow.

Philosophy

People

Teams

AI/ML Foundations  & Capabilities

Algorithms & Optimization

Computing Paradigms

Responsible Human-Centric Technology

Science & Societal Impact

Projects

Publications

Resources

Shaping the future, together.

Student programs

Faculty programs

Conferences & events

Data Management

Recent Publications

Some of our teams

Join us

Defining the technology of today and tomorrow.

Philosophy

People

Teams

AI/ML Foundations & Capabilities

Algorithms & Optimization

Computing Paradigms

Responsible Human-Centric Technology

Science & Societal Impact

Projects

Publications

Resources

Shaping the future, together.

Student programs

Faculty programs

Conferences & events

Data Management

Recent Publications

Some of our teams

Join us

AI/ML Foundations  & Capabilities