Data Commons

Samantha Piekos
R.V. Guha
Natalie Diaz
Prem Ramaswami
Bo Xu
Jennifer Chen
Wei Sun
Jehangir Amjad
James Manyika
Prashanth Radhakrishnan
Julia Wu
Carolyn Au
Ajai Tirumali
(2023)

Abstract

Publicly available data from open sources (E.g., Census [1], BLS [2], WHO [3],
IPCC [4]) are vital resources for policy makers, students and researchers across different
disciplines. Combining data from different sources requires the user to reconcile the
differences in schemas, formats, assumptions, and more. This data wrangling is time
consuming, tedious and needs to be repeated by every user of the data. Our goal with
Data Commons is to address this problem by doing this once and making the processed
data widely available via standard schemas and Cloud APIs. Data Commons is a
distributed network of sites that publish data in a common schema and interoperate
using the Data Commons APIs. Data from different Data Commons can be ‘joined’
easily. The aggregate of these Data Commons can be viewed as a single Knowledge
Graph. This paper describes the architecture of Data Commons, some of the major
deployments and highlights directions for future work.

Research Areas