Katja Filippova
Katja Filippova is currently a research scientist at Google.
Research Areas
Authored Publications
Sort By
Preview abstract
We investigate a formalism for the conditions of a successful explanation of AI. We consider “success” to depend not only on what information the explanation contains, but also on what information the human explainee understands from it. Theory of mind literature discusses the folk concepts that humans use to understand and generalize behavior. We posit that folk concepts of behavior provide us with a “language” that humans understand behavior with. We use these folk concepts as a framework of social attribution by the human explainee—the information constructs that humans are likely to comprehend from explanations—by introducing a blueprint for an explanatory narrative (Figure 1) that explains AI behavior with these constructs. We then demonstrate that many XAI methods today can be mapped to folk concepts of behavior in a qualitative evaluation. This allows us to uncover their failure modes that prevent current methods from explaining successfully—ie, the information constructs that are missing for any given XAI method, and whose inclusion can decrease the likelihood of misunderstanding AI behavior.
View details
Understanding Text Classification Data and Models Using Aggregated Input Salience
Sebastian Ebert
Alice Shoshana Jakobovits
R2HCAI: The AAAI 2023 Workshop on Representation Learning for Responsible Human-Centric AI
Preview abstract
Realizing when a model is right for a wrong reason is not trivial and requires a significant effort by model developers. In some cases an input salience method, which highlights the most important parts of the input, may reveal problematic reasoning. But scrutinizing highlights over many data instances is tedious and often infeasible. Furthermore, analyzing examples in isolation does not reveal general patterns in the data or in the model’s behavior. In this paper we aim to address these issues and go from understanding single examples to understanding entire datasets and models. The methodology we propose is based on aggregated salience maps, to which we apply clustering, nearest neighbor search and visualizations. Using this methodology we address multiple distinct but common model developer needs by showing how problematic data and model behavior can be identified and explained – a necessary first step for improving the model.
View details
Will you Find these Shortcuts? A Protocol for Evaluating Faithfulness of Input Salience Methods for Text Classification
Sebastian Ebert
Proceedings of EMNLP 2022 (to appear)
Preview abstract
Feature attribution a.k.a. input salience methods which assign an importance score to a feature are abundant but may produce surprisingly different results for the same model on the same input. While differences are expected if disparate definitions of importance are assumed, most methods claim to provide faithful attributions and point at features most relevant for a model's prediction. Existing work on faithfulness evaluation is not conclusive and does not provide a clear answer as to how different methods are to be compared.
Focusing on text classification and the model debugging scenario, we propose a protocol for faithfulness evaluation which makes use of partially synthetic data to obtain ground truth for feature importance ranking.
Following the protocol, we do an in-depth analysis of four standard salience method classes on a range of datasets and shortcuts for BERT and LSTM models. We demonstrate that some of the most common method configurations provide poor results even for simplest shortcuts while a method judged to be too simplistic works remarkably well for BERT.
View details
Controlling Machine Translation for Multiple Aspects with Additive Interventions
Andrea Schioppa
Artem Sokolov
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, Association for Computational Linguistics, Online and Punta Cana, Dominican Republic, pp. 6676-6696
Preview abstract
Fine-grained control of machine translation (MT) outputs along multiple aspects is critical for many modern MT applications and is a requirement for gaining users' trust. A standard approach for exerting control in MT is to prepend the input with a special tag to signal the desired output aspect. Despite its simplicity, aspect tagging has several drawbacks: continuous values must be binned into discrete categories, which is unnatural for certain applications; interference between multiple tags is poorly understood and needs fine-tuning. We address these problems by introducing vector-valued interventions which allow for fine-grained control over multiple aspects simultaneously via a weighted linear combinations of the corresponding vectors. For some aspects, our approach even allows for fine-tuning a model trained without annotations to support such interventions. In experiments with three aspects (length, politeness and monotonicity) and two language pairs (English to German and Japanese) our models achieve better control over a wider range of tasks compared to tagging, and translation quality does not degrade when no control is requested. Finally, we demonstrate how to enable control in an already trained model after a relatively cheap fine-tuning stage.
View details
We need to talk about random splits
Anders Østerskov Søgaard
Sebastian Ebert
Proceeding of the 2021 Conference of the European Chapter of the Association for Computational Linguistics (EACL) (to appear)
Preview abstract
Gorman and Bedrick (2019) argued for using random splits rather than standard splits in NLP experiments. We argue that random splits, like standard splits, lead to overly optimistic performance estimates. We can also split data in biased or adversarial ways, e.g., training on short sentences and evaluating on long ones. Biased sampling has been used in domain adaptation to simulate real-world drift; this is known as the covariate shift assumption. In NLP, however, even worst-case splits, maximizing bias, often under-estimate the error observed on new samples of in-domain data, i.e., the data that models should minimally generalize to at test time. This invalidates the covariate shift assumption. Instead of using multiple random splits, future benchmarks should ideally include multiple, independent test sets instead; if infeasible, we argue that multiple biased splits leads to more realistic performance estimates than multiple random splits.
View details
Controlled Hallucinations:Learning to Generate Faithfully from Noisy Data
Findings of EMNLP 2020
Preview abstract
Neural text generation (data- or text-to-text) demonstrates remarkable performance when training data is abundant which for many applications is not the case. To collect a large corpus of parallel data, heuristic rules are often used but they inevitably let noise into the data,
such as phrases in the output which cannot be explained by the input. Consequently, models pick up on the noise and may hallucinate–generate fluent but unsupported text.
Our contribution is a simple but powerful technique to control such hallucinations without dismissing any input and without modifying the model architecture. On the WikiBio corpus (Lebret et al., 2016), a particularly noisy dataset, we demonstrate the efficacy of the technique both
in an automatic and in a human evaluation.
View details
The elephant in the interpretability room: Why use attention as explanation when we have saliency methods?
Proceedings of the 2020 EMNLP Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP
Preview abstract
There is a recent surge of papers that focus on attention as explanation of model predictions, giving mixed evidence on whether attention can be used as such. This has led some to try and `improve' attention so as to make it more interpretable. We argue that we should pay attention no heed.
While attention conveniently gives us one weight per input token and is easily extracted, it is often unclear towards what goal it is used as explanation. We argue that often that goal, whether explicitly stated or not, is to find out what input tokens are the most relevant to a prediction. When that is the case, input saliency methods better suit our needs, and there are no compelling reasons to use attention, despite the coincidence that it provides a weight for each input. With this position paper, we hope to shift some of the recent focus on attention to saliency methods, and for authors to clearly state the goal for their explanations.
View details
Preview abstract
In this paper we study various flavors of variational autoencoders and address the methodological issues with the current neural text generation research and also close some gaps by answering a few natural questions to the studies already published.
View details
Sentence-Level Fluency Evaluation: References Help, But Can Be Spared!
Katharina Kann
Proceedings of the 22nd Conference on Computational Natural Language Learning, Association for Computational Linguistics, Brussels, Belgium (2018), pp. 313-323
Preview abstract
Motivated by recent findings on the probabilistic
modeling of acceptability judgments,
we propose syntactic log-odds ratio (SLOR),
a normalized language model score, as a metric
for referenceless fluency evaluation of natural
language generation output at the sentence
level. We further introduce WPSLOR, a novel
WordPiece-based version, which harnesses a
more compact language model. Even though
word-overlap metrics like ROUGE are computed
with the help of hand-written references,
our referenceless methods obtain a significantly
higher correlation with human fluency
scores on a benchmark dataset of compressed
sentences. Finally, we present ROUGE-LM, a
reference-based metric which is a natural extension
of WPSLOR to the case of available
references. We show that ROUGE-LM yields
a significantly higher correlation with human
judgments than all baseline metrics, including
WPSLOR on its own.
View details
Sentence Compression by Deletion with LSTMs
Lukasz Kaiser
Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing (EMNLP'15)
Preview abstract
We present an LSTM approach to
deletion-based sentence compression
where the task is to translate a sentence
into a sequence of zeros and ones, corresponding
to token deletion decisions.
We demonstrate that even the most basic
version of the system, which is given no
syntactic information (no PoS or NE tags,
or dependencies) or desired compression
length, performs surprisingly well: around
30% of the compressions from a large test
set could be regenerated. We compare the
LSTM system with a competitive baseline
which is trained on the same amount of
data but is additionally provided with
all kinds of linguistic features. In an
experiment with human raters the LSTM-based
model outperforms the baseline
achieving 4.5 in readability and 3.8 in
informativeness.
View details