On Completeness-aware Concept-Based Explanations in Deep Neural Networks
Abstract
Concept-based explanations can be a key direction to understand how DNNs make decisions. In this paper, we study concept-based explainability in a systematic framework. First, we define the notion of completeness, which quantifies how sufficient a particular set of concepts is in explaining the model's behavior. Based on performance and variability motivations, we propose two definitions to quantify completeness. We show that they yield the commonly-used PCA method under certain assumptions. Next, we study two additional constraints to ensure the interpretability of discovered concept, based on sparsity principles. Through systematic experiments, on specifically-designed synthetic dataset and real-world text and image datasets, we demonstrate the superiority of our framework in finding concepts that are complete (in explaining the decision) and that are interpretable.