Data Commons

Prashanth Radhakrishnan
Bo Xu
Carolyn Au
Wei Sun
Jehangir Amjad
Ajai Tirumali
Jennifer Chen
Julia Wu
Natalie Diaz
Samantha Piekos
Prem Ramaswami
James Manyika


Publicly available data from open sources (E.g., Census [1], BLS [2], WHO [3], IPCC [4]) are vital resources for policy makers, students and researchers across different disciplines. Combining data from different sources requires the user to reconcile the differences in schemas, formats, assumptions, and more. This data wrangling is time consuming, tedious and needs to be repeated by every user of the data. Our goal with Data Commons is to address this problem by doing this once and making the processed data widely available via standard schemas and Cloud APIs. Data Commons is a distributed network of sites that publish data in a common schema and interoperate using the Data Commons APIs. Data from different Data Commons can be ‘joined’ easily. The aggregate of these Data Commons can be viewed as a single Knowledge Graph. This paper describes the architecture of Data Commons, some of the major deployments and highlights directions for future work.

Research Areas