Slice Finder: Automated Data Slicing for Model Validation

Neoklis Polyzotis
Steven Whang
Tim Klas Kraska
Yeounoh Chung
Proceedings of the IEEE Int' Conf. on Data Engineering (ICDE), 2019 (to appear)

Abstract

As machine learning (ML) systems become democratized,
helping users easily debug their models becomes increasingly
important. Yet current data tools are still primitive when
it comes to helping users trace model performance problems
all the way to the data. We focus on the particular prob-
lem of slicing data to identify subsets of the training data
where the model performs poorly. Unlike general techniques
(e.g., clustering) that can find arbitrary slices, our goal is to
find interpretable slices (which are easier to take action com-
pared to arbitrary subsets) that are problematic and large.
We propose Slice Finder, which is an interactive framework
for identifying such slices using statistical techniques. The
slices can be used for applications like diagnosing model fair-
ness and fraud detection where describing slices that are
interpretable to humans is necessary.