Katja Filippova

Katja Filippova

Katja Filippova is currently a research scientist at Google.
Authored Publications
Sort By
  • Title
  • Title, descending
  • Year
  • Year, descending
    Understanding Text Classification Data and Models Using Aggregated Input Salience
    Sebastian Ebert
    Alice Shoshana Jakobovits
    R2HCAI: The AAAI 2023 Workshop on Representation Learning for Responsible Human-Centric AI
    Preview abstract Realizing when a model is right for a wrong reason is not trivial and requires a significant effort by model developers. In some cases an input salience method, which highlights the most important parts of the input, may reveal problematic reasoning. But scrutinizing highlights over many data instances is tedious and often infeasible. Furthermore, analyzing examples in isolation does not reveal general patterns in the data or in the model’s behavior. In this paper we aim to address these issues and go from understanding single examples to understanding entire datasets and models. The methodology we propose is based on aggregated salience maps, to which we apply clustering, nearest neighbor search and visualizations. Using this methodology we address multiple distinct but common model developer needs by showing how problematic data and model behavior can be identified and explained – a necessary first step for improving the model. View details
    Preview abstract We investigate a formalism for the conditions of a successful explanation of AI. We consider “success” to depend not only on what information the explanation contains, but also on what information the human explainee understands from it. Theory of mind literature discusses the folk concepts that humans use to understand and generalize behavior. We posit that folk concepts of behavior provide us with a “language” that humans understand behavior with. We use these folk concepts as a framework of social attribution by the human explainee—the information constructs that humans are likely to comprehend from explanations—by introducing a blueprint for an explanatory narrative (Figure 1) that explains AI behavior with these constructs. We then demonstrate that many XAI methods today can be mapped to folk concepts of behavior in a qualitative evaluation. This allows us to uncover their failure modes that prevent current methods from explaining successfully—ie, the information constructs that are missing for any given XAI method, and whose inclusion can decrease the likelihood of misunderstanding AI behavior. View details
    Preview abstract Feature attribution a.k.a. input salience methods which assign an importance score to a feature are abundant but may produce surprisingly different results for the same model on the same input. While differences are expected if disparate definitions of importance are assumed, most methods claim to provide faithful attributions and point at features most relevant for a model's prediction. Existing work on faithfulness evaluation is not conclusive and does not provide a clear answer as to how different methods are to be compared. Focusing on text classification and the model debugging scenario, we propose a protocol for faithfulness evaluation which makes use of partially synthetic data to obtain ground truth for feature importance ranking. Following the protocol, we do an in-depth analysis of four standard salience method classes on a range of datasets and shortcuts for BERT and LSTM models. We demonstrate that some of the most common method configurations provide poor results even for simplest shortcuts while a method judged to be too simplistic works remarkably well for BERT. View details
    We need to talk about random splits
    Anders Østerskov Søgaard
    Sebastian Ebert
    Proceeding of the 2021 Conference of the European Chapter of the Association for Computational Linguistics (EACL) (to appear)
    Preview abstract Gorman and Bedrick (2019) argued for using random splits rather than standard splits in NLP experiments. We argue that random splits, like standard splits, lead to overly optimistic performance estimates. We can also split data in biased or adversarial ways, e.g., training on short sentences and evaluating on long ones. Biased sampling has been used in domain adaptation to simulate real-world drift; this is known as the covariate shift assumption. In NLP, however, even worst-case splits, maximizing bias, often under-estimate the error observed on new samples of in-domain data, i.e., the data that models should minimally generalize to at test time. This invalidates the covariate shift assumption. Instead of using multiple random splits, future benchmarks should ideally include multiple, independent test sets instead; if infeasible, we argue that multiple biased splits leads to more realistic performance estimates than multiple random splits. View details
    Controlling Machine Translation for Multiple Aspects with Additive Interventions
    Andrea Schioppa
    Artem Sokolov
    David Vilar Torres
    Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, Association for Computational Linguistics, Online and Punta Cana, Dominican Republic, pp. 6676-6696
    Preview abstract Fine-grained control of machine translation (MT) outputs along multiple aspects is critical for many modern MT applications and is a requirement for gaining users' trust. A standard approach for exerting control in MT is to prepend the input with a special tag to signal the desired output aspect. Despite its simplicity, aspect tagging has several drawbacks: continuous values must be binned into discrete categories, which is unnatural for certain applications; interference between multiple tags is poorly understood and needs fine-tuning. We address these problems by introducing vector-valued interventions which allow for fine-grained control over multiple aspects simultaneously via a weighted linear combinations of the corresponding vectors. For some aspects, our approach even allows for fine-tuning a model trained without annotations to support such interventions. In experiments with three aspects (length, politeness and monotonicity) and two language pairs (English to German and Japanese) our models achieve better control over a wider range of tasks compared to tagging, and translation quality does not degrade when no control is requested. Finally, we demonstrate how to enable control in an already trained model after a relatively cheap fine-tuning stage. View details
    The elephant in the interpretability room: Why use attention as explanation when we have saliency methods?
    Proceedings of the 2020 EMNLP Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP
    Preview abstract There is a recent surge of papers that focus on attention as explanation of model predictions, giving mixed evidence on whether attention can be used as such. This has led some to try and `improve' attention so as to make it more interpretable. We argue that we should pay attention no heed. While attention conveniently gives us one weight per input token and is easily extracted, it is often unclear towards what goal it is used as explanation. We argue that often that goal, whether explicitly stated or not, is to find out what input tokens are the most relevant to a prediction. When that is the case, input saliency methods better suit our needs, and there are no compelling reasons to use attention, despite the coincidence that it provides a weight for each input. With this position paper, we hope to shift some of the recent focus on attention to saliency methods, and for authors to clearly state the goal for their explanations. View details
    Preview abstract Neural text generation (data- or text-to-text) demonstrates remarkable performance when training data is abundant which for many applications is not the case. To collect a large corpus of parallel data, heuristic rules are often used but they inevitably let noise into the data, such as phrases in the output which cannot be explained by the input. Consequently, models pick up on the noise and may hallucinate–generate fluent but unsupported text. Our contribution is a simple but powerful technique to control such hallucinations without dismissing any input and without modifying the model architecture. On the WikiBio corpus (Lebret et al., 2016), a particularly noisy dataset, we demonstrate the efficacy of the technique both in an automatic and in a human evaluation. View details
    Sentence-Level Fluency Evaluation: References Help, But Can Be Spared!
    Katharina Kann
    Proceedings of the 22nd Conference on Computational Natural Language Learning, Association for Computational Linguistics, Brussels, Belgium(2018), pp. 313-323
    Preview abstract Motivated by recent findings on the probabilistic modeling of acceptability judgments, we propose syntactic log-odds ratio (SLOR), a normalized language model score, as a metric for referenceless fluency evaluation of natural language generation output at the sentence level. We further introduce WPSLOR, a novel WordPiece-based version, which harnesses a more compact language model. Even though word-overlap metrics like ROUGE are computed with the help of hand-written references, our referenceless methods obtain a significantly higher correlation with human fluency scores on a benchmark dataset of compressed sentences. Finally, we present ROUGE-LM, a reference-based metric which is a natural extension of WPSLOR to the case of available references. We show that ROUGE-LM yields a significantly higher correlation with human judgments than all baseline metrics, including WPSLOR on its own. View details
    Preview abstract In this paper we study various flavors of variational autoencoders and address the methodological issues with the current neural text generation research and also close some gaps by answering a few natural questions to the studies already published. View details
    Idest: Learning a Distributed Representation for Event Patterns
    Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL'15), pp. 1140-1149
    Preview abstract This paper describes IDEST, a new method for learning paraphrases of event patterns. It is based on a new neural network architecture that only relies on the weak supervision signal that comes from the news published on the same day and mention the same real-world entities. It can generalize across extractions from different dates to produce a robust paraphrase model for event patterns that can also capture meaningful representations for rare patterns. We compare it with two state-of-the-art systems and show that it can attain comparable quality when trained on a small dataset. Its generalization capabilities also allow it to leverage much more data, leading to substantial quality improvements. View details