CDC Birth Vital Statistics in BigQuery

January 13, 2012

Posted by Dan Vanderkam, Software Engineer

Google’s BigQuery Service lets enterprises and developers crunch large-scale data sets quickly. But what if you don’t have a large-scale data set of your own?

To help the data-less masses, BigQuery offers several large, public data sets. One of these is the natality data set, which records information about live births in the United States. The data is derived from the Division of Vital Statistics at the Centers for Disease Control and Prevention, which has collected an electronic record of birth statistics since 1969. It is one of the longest-running electronic records in existence.

Each row in this database represents a live birth. Using simple queries, you can discover fascinating trends from the last forty years.

For example, here’s the average age of women giving birth to their first child:

The average age has increased from 21.3 years in 1969 to 25.1 years in 2008. Using more complex queries, one could analyze the factors which have contributed to this increase, i.e. whether it can be explained by changing racial/ethnic composition of the population.

You can see more examples like this one on the BigQuery site.