Towards A Fairer Landmark Recognition Dataset

Bingyi Cao
Cam Askew
Jack Sim
Mike Green
N'Mah Fodiatu Yilla-Akbari
Zu Kim
arXiv (2021)
We introduce a new landmark recognition dataset, whichis created with a focus on fair worldwide representation.While previous work proposes to collect as many imagesas possible from web repositories, we instead argue thatsuch approaches can lead to biased data. To create a morecomprehensive and equitable dataset, we start by definingthe fairrelevanceof a landmark to the world population.These relevances are estimated by combining anonymizedGoogle Maps user contribution statistics with the contribu-tors’ demographic information. We present a stratificationapproach and analysis which leads to a much fairer cover-age of the world, compared to existing datasets. The result-ing datasets are used to evaluate computer vision models aspart of the the Google Landmark Recognition and RetrievalChallenges 2021.