Google Research

Metadata for datasets in CORD-19 papers

Description

The dataset contains paper--dataset pairs for datasets mentioned or referenced in CORD-19 papers, an open research datasets of papers relevant for COVID-19. Specifically, the content contributes the metadata for these datasets collected from their descriptions in schema.org across data repositories on the Web.

The CORD-19 dataset is a collaboration between multiple institutions to speed up research on coronavirus and COVID-19, and we are enhancing this dataset with additional metadata.

Google's Dataset Search makes it easier for researchers, students, and data geeks to discover datasets that they need for their work. It is built on the idea that metadata and data should be open whenever possible. Dataset Search fully relies on data providers making their metadata openly available through schema.org annotations on their sites. The Dataset Search metadata is largely a cleaned up collection of the data made public by data providers on the Web. This collection is the normalized, cleaned up metadata for datasets that are mentioned in the CORD-19 papers. It contains paper-dataset pairs.

Limitations:

  • Only datasets that have schema.org metadata on their pages are included.
  • Because we identify the paper--dataset correspondences automatically, some correspondences may be missing and some may be spurious.