Announcing an updated YouTube-8M, and the 2nd YouTube-8M Large-Scale Video Understanding Challenge and Workshop

June 5, 2018

Posted by Joonseok Lee, Software Engineer, Google AI



Last year, we organized the first YouTube-8M Large-Scale Video Understanding Challenge with Kaggle, in which 742 teams consisting of 946 individuals from 60 countries used the YouTube-8M dataset (2017 edition) to develop classification algorithms which accurately assign video-level labels. The purpose of the competition was to accelerate improvements in large-scale video understanding, representation learning, noisy data modeling, transfer learning and domain adaptation approaches that can help improve the machine-learning models that classify video. In addition to the competition, we hosted an affiliated workshop at CVPR’17, inviting competition top-performers and researchers and share their ideas on how to advance the state-of-the-art in video understanding.

As a continuation of these efforts to accelerate video understanding, we are excited to announce another update to the YouTube-8M dataset, a new Kaggle video understanding challenge and an affiliated 2nd Workshop on YouTube-8M Large-Scale Video Understanding, to be held at the 2018 European Conference on Computer Vision (ECCV'18).
An Updated YouTube-8M Dataset (2018 Edition)
Our YouTube-8M (2018 edition) features a major improvement in the quality of annotations, obtained using a machine learning system that combines audio-visual content with title, description and other metadata to provide more accurate ground truth annotations. The updated version contains 6.1 million URLs, labeled with a vocabulary of 3,862 visual entities, with each video annotated with one or more labels and an average of 3 labels per video. We have also updated the starter code, with updated instructions for downloading and training TensorFlow video annotation models on the dataset.

The 2nd YouTube-8M Video Understanding Challenge
The 2nd YouTube-8M Video Understanding Challenge invites participants to build audio-visual content classification models using YouTube-8M as training data, and then to label an unknown subset of test videos. Unlike last year, we strictly impose a hard limit on model size, encouraging participants to advance a single model within tight budget rather than assembling as many models as possible. Each of the top 5 teams will be awarded $5,000 to support their travel to Munich to attend ECCV’18. For details, please visit the Kaggle competition page.

The 2nd Workshop on YouTube-8M Large-Scale Video Understanding
To be held at ECCV’18, the workshop will consist of invited talks by distinguished researchers, as well as presentations by top-performing challenge participants in order to facilitate the exchange of ideas. We encourage those who wish to attend to submit papers describing their research, experiments, or applications based on YouTube-8M dataset, including papers summarizing their participation in the challenge above. Please refer to the workshop page for more details.

It is our hope that this update to the dataset, along with the new challenge and workshop, will continue to advance the research in large-scale video understanding. We hope you will join us again!

Acknowledgements
This post reflects the work of many machine perception researchers including Sami Abu-El-Haija, Ke Chen, Nisarg Kothari, Joonseok Lee, Hanhan Li, Paul Natsev, Sobhan Naderi Parizi, Rahul Sukthankar, George Toderici, Balakrishnan Varadarajan, as well as Sohier Dane, Julia Elliott, Wendy Kan and Walter Reade from Kaggle. We are also grateful for the support and advice from our partners at YouTube.