The AudioSet dataset is a large-scale collection of human-labeled 10-second sound clips drawn from YouTube videos. To collect all our data we worked with human annotators who verified the presence of sounds they heard within YouTube segments. To nominate segments for annotation, we relied on YouTube metadata and content-based search.
The sound events in the dataset consist of a subset of the AudioSet ontology. You can learn more about the dataset construction in our ICASSP 2017 paper.