Adversarial Test Set for Image Classification: Lessons Learned from CATS4ML Data Challenge

Lora Mois Aroyo; Praveen Kumar Paritosh

Adversarial Test Set for Image Classification: Lessons Learned from CATS4ML Data Challenge

Lora Mois Aroyo

Praveen Kumar Paritosh

NeurIPS 2021 Datasets and Benchmarks Track, NeurIPS (2021)

Google Scholar

Abstract

A primary role of data in ML is to serve as benchmarks that allow us to measure progress. Often, items that are difficult and have natural ambiguity of real world context are relatively underrepresented in evaluation datasets and benchmarks. This absence of ambiguous real-world examples in evaluation undermines the ability to reliably test machine learning performance. This results in unknown unknowns of an ML model’s behaviour, which is a large risk in their deployment.

We designed and ran a public data challenge to proactively discover unknown unknowns in state-of-the-art image classification models applied to the Open Images v6 dataset. In this paper, we describe the design and implementation of the AAAI HCOMP CATS4ML 2020 challenge. Participants in this challenge were incentivized to find images that are incorrectly classified by the ML models.

We present a set of failure modes in the state-of-art image classification abstracted from the 13,000 submissions from this challenge. We present a black-swan benchmark test set based on this challenge.

Research Areas

Human-computer interaction and visualization

Explore our many areas of focus

Building a collaborative ecosystem

Shaping the future together

Translating discovery into real-world impact

Adversarial Test Set for Image Classification: Lessons Learned from CATS4ML Data Challenge

Abstract

Research Areas

Meet the teams driving innovation

Google AI

Google Cloud

Google DeepMind

Google Labs