No Classification without Representation: Assessing Geodiversity Issues in Open Data Sets for the Developing World

Shreya Shankar; Yoni Halpern; Eric Breck; James Atwood; Jimbo Wilson; D. Sculley

No Classification without Representation: Assessing Geodiversity Issues in Open Data Sets for the Developing World

Shreya Shankar

Yoni Halpern

Eric Breck

James Atwood

Jimbo Wilson

D. Sculley

NIPS 2017 workshop: Machine Learning for the Developing World

Download Google Scholar

Abstract

Modern machine learning systems such as image classifers rely heavily on
large scale data sets for training. Such data sets are costly to create,
thus in practice a small number of freely available, open source data sets
are widely used. Such strategies may be particularly important for ML
applications in the developing world, where resources may be constrained
and the cost of creating suitable large scale data sets may be a
blocking factor. However, we suggest that examining the {\em geo-diversity}
of open data sets is critical before adopting a data set for such use
cases. In particular, we analyze two large, publicly available image
data sets to assess geo-diversity and find that these data sets appear
to exhibit a observable amerocentric and eurocentric representation bias.
Further, we perform targeted analysis on classifiers that use these data
sets as training data to assess the impact of these training distributions,
and find strong differences in the relative performance on images from
different locales. These results emphasize the need to ensure
geo-representation when constructing data sets for use in the developing
world.

Research Areas

Machine intelligence

Explore our many areas of focus

Building a collaborative ecosystem

Shaping the future together

Translating discovery into real-world impact

No Classification without Representation: Assessing Geodiversity Issues in Open Data Sets for the Developing World

Abstract

Research Areas

Meet the teams driving innovation

Google AI

Google Cloud

Google DeepMind

Google Labs