Google Research

On Completeness-aware Concept-Based Explanations in Deep Neural Networks

NeurIPS (2020) (to appear)


Concept-based explanations can be a key direction to understand how DNNs make decisions. In this paper, we study concept-based explainability in a systematic framework. First, we define the notion of completeness, which quantifies how sufficient a particular set of concepts is in explaining the model's behavior. Based on performance and variability motivations, we propose two definitions to quantify completeness. We show that they yield the commonly-used PCA method under certain assumptions. Next, we study two additional constraints to ensure the interpretability of discovered concept, based on sparsity principles. Through systematic experiments, on specifically-designed synthetic dataset and real-world text and image datasets, we demonstrate the superiority of our framework in finding concepts that are complete (in explaining the decision) and that are interpretable.

Research Areas

Learn more about how we do research

We maintain a portfolio of research projects, providing individuals and teams the freedom to emphasize specific types of work