Been Kim
Been is a research scientist at Brain. Her research focuses on improving interpretability in machine learning by building interpretability method for already-trained models or building inherently interpretable models. She has MS and PhD degrees from MIT.
Been has given tutorials on interpretability at ICML 2017 ,
at the
Deep Learning Summer school at University of Toronto, Vector institute in 2018
and at CVPR 2018 .
Been is one of the executive board member of Women in Machine Learning (WiML), and helps with various ML conferences as a workshop chair, an area chair, a steering committee and a program chair. More on here .
Authored Publications
Sort By
Beyond Rewards: a Hierarchical Perspective on Offline Multiagent Behavioral Analysis
Shayegan Omidshafiei
Yannick Assogba
Advances in Neural Information Processing Systems (NeurIPS) (2022) (to appear)
Preview abstract
Each year, expert-level performance is attained in increasingly-complex multiagent domains, notable examples including Go, Poker, and StarCraft II. This rapid progression is accompanied by a commensurate need to better understand how such agents attain this performance, to enable their safe deployment, identify limitations, and reveal potential means of improving them. In this paper we take a step back from performance-focused multiagent learning, and instead turn our attention towards agent behavior analysis. We introduce a model-agnostic method for discovery of behavior clusters in multiagent domains, using variational inference to learn a hierarchy of behaviors at the joint and local agent levels. Our framework makes no assumption about agents' underlying learning algorithms, does not require access to their latent states or policies, and is trained using only offline observational data. We illustrate the effectiveness of our method for enabling the coupled understanding of behaviors at the joint and local agent level, detection of behavior changepoints throughout training, discovery of core behavioral concepts, demonstrate the approach's scalability to a high-dimensional multiagent MuJoCo control domain, and also illustrate that the approach can disentangle previously-trained policies in OpenAI's hide-and-seek domain.
View details
DISSECT: Disentangled Simultaneous Explanations via Concept Traversals
Chun-Liang Li
Brian Eoff
Rosalind Picard
International Conference on Learning Representations (ICLR) (2022)
Preview abstract
Explaining deep learning model inferences is a promising venue for scientific understanding, improving safety, uncovering hidden biases, evaluating fairness, and beyond, as argued by many scholars. One of the principal benefits of counterfactual explanations is allowing users to explore "what-if" scenarios through what does not and cannot exist in the data, a quality that many other forms of explanation such as heatmaps and influence functions are inherently incapable of doing. However, most previous work on generative explainability cannot disentangle important concepts effectively, produces unrealistic examples, or fails to retain relevant information. We propose a novel approach, DISSECT, that jointly trains a generator, a discriminator, and a concept disentangler to overcome such challenges using little supervision. DISSECT generates Concept Traversals (CTs), defined as a sequence of generated examples with increasing degrees of concepts that influence a classifier's decision. By training a generative model from a classifier's signal, DISSECT offers a way to discover a classifier's inherent "notion" of distinct concepts automatically rather than rely on user-predefined concepts. We show that DISSECT produces CTs that (1) disentangle several concepts, (2) are influential to a classifier's decision and are coupled to its reasoning due to joint training (3), are realistic, (4) preserve relevant information, and (5) are stable across similar inputs. We validate DISSECT on several challenging synthetic and realistic datasets where previous methods fall short of satisfying desirable criteria for interpretability and show that it performs consistently well. Finally, we present experiments showing applications of DISSECT for detecting potential biases of a classifier and identifying spurious artifacts that impact predictions.
View details
Best of both worlds: local and global explanations with human-understandable concepts
Sebastien Baur
Shaobo Hou
Eric Loreaux
Diana Mincu
Ralph Blanes
(2021)
Preview abstract
Interpretability techniques aim to provide the rationale behind a model's decision, typically by explaining either an individual prediction (local explanation, e.g. `why is this patient diagnosed with this condition') or a class of predictions (global explanation, e.g. `why is this set of patients diagnosed with this condition in general'). While there are many methods focused on either one, few frameworks can provide both local and global explanations in a consistent manner. In this work, we combine two powerful existing techniques, one local (Integrated Gradients, IG) and one global (Testing with Concept Activation Vectors), to provide local and global concept-based explanations. We first sanity check our idea using two synthetic datasets with a known ground truth, and further demonstrate with a benchmark natural image dataset. We test our method with various concepts, target classes, model architectures and IG parameters (e.g. baselines). We show that our method improves global explanations over vanilla TCAV when compared to ground truth, and provides useful local insights. Finally, a user study demonstrates the usefulness of the method compared to no or global explanations only. We hope our work provides a step towards building bridges between many existing local and global methods to get the best of both worlds.
View details
On Completeness-aware Concept-Based Explanations in Deep Neural Networks
Chih-kuan Yeh
Chun-Liang Li
Pradeep Ravikumar
NeurIPS (2020) (to appear)
Preview abstract
Concept-based explanations can be a key direction to understand how DNNs make decisions. In this paper, we study concept-based explainability in a systematic framework. First, we define the notion of completeness, which quantifies how sufficient a particular set of concepts is in explaining the model's behavior. Based on performance and variability motivations, we propose two definitions to quantify completeness. We show that they yield the commonly-used PCA method under certain assumptions. Next, we study two additional constraints to ensure the interpretability of discovered concept, based on sparsity principles. Through systematic experiments, on specifically-designed synthetic dataset and real-world text and image datasets, we demonstrate the superiority of our framework in finding concepts that are complete (in explaining the decision) and that are interpretable.
View details
Concept Bottleneck Models
Pang Wei Koh
Thao Nguyen
Yew Siang Tang
Stephen Mussmann
Emma Pierson
Percy Liang
ICML 2020 (2020) (to appear)
Preview abstract
We seek to learn models that support interventions on high-level concepts: e.g., would the model would have predicted severe arthritis if it didn’t think that there was a bone spur in the x-ray? However, state-of-the-art neural networks are trained end-to-end from raw input (e.g., pixels) to output (e.g., arthritis severity), and do not admit manipulation of high-level concepts like “the existence of bone spurs”. In this paper, we revisit the classic idea of learning concept bottleneck models that first predict concepts (provided at training time) from the raw input, and then predict the final label from these concepts. By construction, we can intervene on the predicted concepts at test time and propagate these changes to the final prediction. On an x-ray dataset and bird species recognition dataset, concept bottleneck models achieve competitive predictive accuracy with standard end-to-end models, while allowing us to explain predictions in terms of high-level clinical concepts (“bone spurs”) and bird attributes (“wing color”). Moreover, concept bottleneck models allow for richer human-model interaction: model accuracy improves significantly if we can correct model mistakes on concepts at test time.
View details
Human-Centered Tools for Coping with Imperfect Algorithms during Medical Decision-Making
Jason Hipp
Daniel Smilkov
Martin Stumpe
Conference on Human Factors in Computing Systems (2019)
Preview abstract
Machine learning (ML) is increasingly being used in image retrieval systems for medical decision making. One application of ML is to retrieve visually similar medical images from past patients (e.g. tissue from biopsies) to reference when making a medical decision with a new patient. However, no algorithm can perfectly capture an expert's ideal notion of similarity for every case: an image that is algorithmically determined to be similar may not be medically relevant to a doctor's specific diagnostic needs. In this paper, we identified the needs of pathologists when searching for similar images retrieved using a deep learning algorithm, and developed tools that empower users to cope with the search algorithm on-the-fly, communicating what types of similarity are most important at different moments in time. In two evaluations with pathologists, we found that these refinement tools increased the diagnostic utility of images found and increased user trust in the algorithm. The tools were preferred over a traditional interface, without a loss in diagnostic accuracy. We also observed that users adopted new strategies when using refinement tools, re-purposing them to test and understand the underlying algorithm and to disambiguate ML errors from their own errors. Taken together, these findings inform future human-ML collaborative systems for expert decision-making.
View details
Preview abstract
Interpretability has become an important topic of research as more machine learning (ML) models are deployed and widely used to make important decisions. Most of the current explanation methods provide explanations through feature importance scores, which identify features that are salient for each individual input. However, how to systematically summarize and interpret such per sample feature importance scores itself is challenging. In this work, we propose principles and desiderata for \emph{concept} based explanation, which goes beyond per-sample features to identify higher level human-understandable concepts that apply across the entire dataset. We develop a new algorithm, ACE, to automatically extract visual concepts. Our systematic experiments demonstrate that ACE discovers concepts that are human-meaningful, coherent and salient for the neural network's predictions.
View details
Learning how to explain neural networks: PatternNet and PatternAttribution
Kristof T. Schütt
Maximilian Alber
Klaus-Robet Müller
Sven Dähne
ICLR (2018)
Preview abstract
DeConvNet, Guided BackProp, LRP, were invented to better understand deep neural networks. We show that these methods do not produce the theoretically correct explanation for a linear model. Yet they are used on multi-layer networks with millions of parameters. This is a cause for concern since linear models are simple neural networks. We argue that explanation methods for neural nets should work reliably in the limit of simplicity, the linear models. Based on our analysis of linear models we propose a generalization that yields two explanation techniques (PatternNet and PatternAttribution) that are theoretically sound for linear models and produce improved explanations for deep networks.
View details
Preview abstract
Knowing when a classifier's prediction can be trusted is useful in many applications and critical for safely using AI. While the bulk of the effort in machine learning research has been towards improving classifier performance, understanding when a classifier's predictions should and should not be trusted has received far less attention. The standard approach is to use the classifier's discriminant or confidence score; however, we show there exists an alternative that is more effective in many situations. We propose a new score, called the {\it trust score}, which measures the agreement between the classifier and a modified nearest-neighbor classifier on the testing example. We show empirically that high (low) trust scores produce surprisingly high precision at identifying correctly (incorrectly) classified examples, consistently outperforming the classifier's confidence score as well as many other baselines. Further, under some mild distributional assumptions, we show that if the trust score for an example is high (low), the classifier will likely agree (disagree) with the Bayes-optimal classifier. Our guarantees consist of non-asymptotic rates of statistical consistency under various nonparametric settings and build on recent developments in topological data analysis.
View details
Preview abstract
We often desire our models to be interpretable as well as accurate. Prior work on optimizing models for interpretability has relied on easy-to-quantify proxies for interpretability, such as sparsity or the number of operations required. In this work, we optimize for interpretability by directly including humans in the optimization loop. We develop an algorithm that minimizes the number of user studies to find models that are both predictive and interpretable and demonstrate our approach on several data sets. Our human subjects results show trends towards different proxy notions of interpretability on different datasets, which suggests that different proxies are preferred on different tasks.
View details