Beyond Model Interpretability: Socio-Structural Explanations in Machine Learning
Abstract
What is it to explain the outputs of an opaque machine learning model? Popular strategies in the literature are to develop
explainable machine learning techniques. These techniques approximate how the model works by providing local or global
information about the inner workings of a machine learning model. In this paper, we argue that, in some cases, explaining
machine learning outputs requires appealing to the third kind of explanation that we call socio-structural explanations.
The importance of socio-structural explanations is motivated by the observation that machine learning models are not
autonomous mathematico-computational entities. Instead, their very existence is intrinsically tied to the social context in
which they operate. Sometimes, the social structures are mirrored in the design and training of machine learning models
and hence appealing to the socio-structural explanations offers the relevant explanation for why the output is obtained.
By thoroughly examining a well-known case of racially biased algorithmic resource allocation in healthcare, we highlight
the significance of socio-structural explanations. One ramification of our proposal is that to understand how machine
learning models perpetuate unjust social harms, more is needed to interpret them by model interpretability methods.
Instead, providing socio-structural explanations adds explanatory adequacy as to how and why machine learning outputs
are obtained
explainable machine learning techniques. These techniques approximate how the model works by providing local or global
information about the inner workings of a machine learning model. In this paper, we argue that, in some cases, explaining
machine learning outputs requires appealing to the third kind of explanation that we call socio-structural explanations.
The importance of socio-structural explanations is motivated by the observation that machine learning models are not
autonomous mathematico-computational entities. Instead, their very existence is intrinsically tied to the social context in
which they operate. Sometimes, the social structures are mirrored in the design and training of machine learning models
and hence appealing to the socio-structural explanations offers the relevant explanation for why the output is obtained.
By thoroughly examining a well-known case of racially biased algorithmic resource allocation in healthcare, we highlight
the significance of socio-structural explanations. One ramification of our proposal is that to understand how machine
learning models perpetuate unjust social harms, more is needed to interpret them by model interpretability methods.
Instead, providing socio-structural explanations adds explanatory adequacy as to how and why machine learning outputs
are obtained