Concept Bottleneck Models
Abstract
We seek to learn models that support interventions on high-level concepts: e.g., would the model would have predicted severe arthritis if it didn’t think that there was a bone spur in the x-ray? However, state-of-the-art neural networks are trained end-to-end from raw input (e.g., pixels) to output (e.g., arthritis severity), and do not admit manipulation of high-level concepts like “the existence of bone spurs”. In this paper, we revisit the classic idea of learning concept bottleneck models that first predict concepts (provided at training time) from the raw input, and then predict the final label from these concepts. By construction, we can intervene on the predicted concepts at test time and propagate these changes to the final prediction. On an x-ray dataset and bird species recognition dataset, concept bottleneck models achieve competitive predictive accuracy with standard end-to-end models, while allowing us to explain predictions in terms of high-level clinical concepts (“bone spurs”) and bird attributes (“wing color”). Moreover, concept bottleneck models allow for richer human-model interaction: model accuracy improves significantly if we can correct model mistakes on concepts at test time.