Data Commons
Abstract
Publicly available data from open sources (E.g., Census [1], BLS [2], WHO [3],
IPCC [4]) are vital resources for policy makers, students and researchers across different
disciplines. Combining data from different sources requires the user to reconcile the
differences in schemas, formats, assumptions, and more. This data wrangling is time
consuming, tedious and needs to be repeated by every user of the data. Our goal with
Data Commons is to address this problem by doing this once and making the processed
data widely available via standard schemas and Cloud APIs. Data Commons is a
distributed network of sites that publish data in a common schema and interoperate
using the Data Commons APIs. Data from different Data Commons can be ‘joined’
easily. The aggregate of these Data Commons can be viewed as a single Knowledge
Graph. This paper describes the architecture of Data Commons, some of the major
deployments and highlights directions for future work.
IPCC [4]) are vital resources for policy makers, students and researchers across different
disciplines. Combining data from different sources requires the user to reconcile the
differences in schemas, formats, assumptions, and more. This data wrangling is time
consuming, tedious and needs to be repeated by every user of the data. Our goal with
Data Commons is to address this problem by doing this once and making the processed
data widely available via standard schemas and Cloud APIs. Data Commons is a
distributed network of sites that publish data in a common schema and interoperate
using the Data Commons APIs. Data from different Data Commons can be ‘joined’
easily. The aggregate of these Data Commons can be viewed as a single Knowledge
Graph. This paper describes the architecture of Data Commons, some of the major
deployments and highlights directions for future work.