Slicing and dicing data for interactive visualization

February 28, 2011

Posted by Benjamin Yolken, Google Public Data Product Manager



A year ago, we introduced the Google Public Data Explorer, a tool that allows users to interactively explore public-interest datasets from a variety of influential sources like the World Bank, IMF, Eurostat, and the US Census Bureau. Today, users can visualize over 300 metrics across 31 datasets, including everything from labor productivity (OECD) to Internet speed (Ookla) to gender balance in parliaments (UNECE) to government debt levels (IMF) to population density by municipality (Statistics Catalonia), with more data being added every week.

Last week, as part of the launch of our dataset upload interface, we released one of the key pieces of technology behind the product: the Dataset Publishing Language (DSPL). We created this format to address a key problem in the Public Data Explorer and other, similar tools, namely, that existing data formats don’t provide enough information to support easy yet powerful data exploration by non-technical users.

DSPL addresses this by adding an additional layer of metadata on top of the raw, tabular data in a dataset. This metadata, expressed in XML, describes the concepts in the dataset, for instance “country”, “gender”, “population”, and “unemployment”, giving descriptions, URLs, formatting properties, etc. for each. These concepts are then referenced in slices, which partition the former into dimensions (i.e., categories) and metrics (i.e., quantitative values) and link them with the underlying data tables (provided in CSV format). This structure, along with some additional metadata, is what allows us to provide rich, interactive dataset visualizations in the Public Data Explorer.

With the release of DSPL, we hope to accelerate the process of making the world’s datasets searchable, visualizable, and understandable, without requiring a PhD in statistics. We encourage you to read more about the format and try it yourself, both in the Public Data Explorer and in your own software. Stay tuned for more DSPL extensions and applications in the future!