Google Research

How complete are the CDC's COVID-19 Case Surveillance and NCHS datasets for deaths with race/ethnicity at the state and county levels?

Google, Inc. (2021)


The Covid Tracking Project was the most reliable source for COVID-19 data with race/ethnicity at the state level until it stopped collecting data on March 7, 2021. The CDC's Case Surveillance Restricted Access and National Center for Health Statistics provisional deaths datasets are the best available replacements for the Covid Tracking Project's dataset, and they additionally include county-level data and age along with race/ethnicity. This paper evaluates the completeness of the CDC datasets at the state and county levels in terms of (1) the total number of deaths included compared to the New York Times, and (2) the number of deaths included with race/ethnicity data compared to the Covid Tracking Project.

The CDC's Restricted Access dataset contains 79% of the deaths in the New York Times up to April 15, and 84% of deaths have race/ethnicity information vs. 93% in the Covid Tracking Project. At the state and county levels, the dataset's completeness is highly variable with 11 states reporting fewer than 10% of deaths and eight reporting 0% of the deaths included in the New York Times. The National Center for Health Statistics' dataset is highly complete in all states except for North Carolina. At the county level, the National Center for Health Statistics' dataset is more complete within the counties it contains, but it only contains counties with at least 100 COVID-19 deaths, which are generally counties with larger populations.

Learn more about how we do research

We maintain a portfolio of research projects, providing individuals and teams the freedom to emphasize specific types of work