The YouTube-8M Segments dataset is an extension of the YouTube-8M dataset with human-verified segment annotations. In addition to annotating videos, the task includes temporally localizing the entities in the videos, i.e., find out when the entities occur.
We collected human-verified labels on about 237K segments on 1000 classes from the validation set of the YouTube-8M dataset. Each video comes with time-localized frame-level features so classifier predictions can be made at segment-level granularity. We encourage researchers to leverage the large amount of noisy video-level labels in the YouTube-8M training set to train models for temporal localization.