Google Research

Data Commons

  • R.V. Guha
  • Prashanth Radhakrishnan
  • Bo Xu
  • Carolyn Au
  • Wei Sun
  • Jehangir Amjad
  • Ajai Tirumali
  • Jennifer Chen
  • Julia Wu
  • Natalie Diaz
  • Samantha Piekos
  • Prem Ramaswami
  • James Manyika
(2023)

Abstract

Publicly available data from open sources (E.g., Census [1], BLS [2], WHO [3], IPCC [4]) are vital resources for policy makers, students and researchers across different disciplines. Combining data from different sources requires the user to reconcile the differences in schemas, formats, assumptions, and more. This data wrangling is time consuming, tedious and needs to be repeated by every user of the data. Our goal with Data Commons is to address this problem by doing this once and making the processed data widely available via standard schemas and Cloud APIs. Data Commons is a distributed network of sites that publish data in a common schema and interoperate using the Data Commons APIs. Data from different Data Commons can be ‘joined’ easily. The aggregate of these Data Commons can be viewed as a single Knowledge Graph. This paper describes the architecture of Data Commons, some of the major deployments and highlights directions for future work.

Research Areas

Learn more about how we do research

We maintain a portfolio of research projects, providing individuals and teams the freedom to emphasize specific types of work