Jump to Content

Addressing Stability in Classifier Explanations

Dake He
Nasrin Baratalipour
Siavash Samiei
2021 IEEE International Conference on Big Data

Abstract

Machine learning based classifiers are often a black box when considering the contribution of inputs to output probability of a label, especially with complex non-linear models such as neural networks. A popular way to explain machine learning model outputs in a model independent manner is through the use of Shapley values. We discuss the problem of instability when using Shapley values in explanations - where we found explanations to vary due to random sampling in the algorithm. We show how this problem can be effectively addressed using Monte Carlo integration in the form of averaging the model output while varying only a subset of features in the example to be explained. This unlocks the use of Shapley value based explainers for a variety of classifiers including neural networks.