The Building Blocks of Interpretability

  • Christopher Olah
  • Arvind Satyanarayan
  • Ian Johnson
  • Shan Carter
  • Ludwig Schubert
  • Katherine Ye
  • Alexander Mordvintsev
Distill (2018)


Interpretability techniques are normally studied in isolation. We explore the powerful interfaces that arise when you combine them -- and the rich structure of this combinatorial space.

