Crowdsource by Google: A Platform for Collecting Inclusive and Representative Machine Learning Data

Supheakmungkol Sarin
Knot Pipatsrisawat
Khiem Pham
Anurag Batra
Luis Valente


This demo paper presents Crowdsource by Google, a platform for collecting inclusive and representative machine learning data to build AI products for everyone. Crowdsource by Google is enjoyed by our global community of passionate individuals who care about their languages and cultures and understand the need for diversity in machine learning and AI. In this paper, we discuss our design principles when it comes to our users: delightful experience, respect, and transparency. These principles make contributing data to Crowdsource by Google an open, trusting, and enjoyable experience for users. One of our early impacts includes creating an open-source dataset called Open Images Extended with over 478,000 images across 6,000+ categories from 70+ countries. This dataset increased representation of the Indian subcontinent by an estimated 250% in the original Open Images dataset.