Addressing Stability in Classifier Explanations
Abstract
Machine learning based classifiers are often a black box when considering the contribution of inputs to output probability of a label, especially with complex non-linear models such as neural networks. A popular way to explain machine learning model outputs in a model independent manner is through the use of Shapley values. We discuss the problem of instability when using Shapley values in explanations - where we found explanations to vary due to random sampling in the algorithm. We show how this problem can be effectively addressed using Monte Carlo integration in the form of averaging the model output while varying only a subset of features in the example to be explained. This unlocks the use of Shapley value based explainers for a variety of classifiers including neural networks.