Google Research

Explaining Deep Neural Networks using Unsupervised Clustering

2020 Workshop on Human Interpretability in Machine Learning (2020)


We propose a novel method to explain trained deep neural networks (DNNs), by distilling them into surrogate models using unsupervised clustering. Our method can be flexibly applied to any subset of layers of a DNN architecture and can incorporate low-level and high-level information. On image datasets given pre-trained DNNs, we demonstrate strength of our method in finding similar training samples, and shedding light on the concepts the DNN bases its decision on. Via user studies, we show that our model can improve user trust in model’s prediction.

Research Areas

Learn more about how we do research

We maintain a portfolio of research projects, providing individuals and teams the freedom to emphasize specific types of work