Nithum Thain
Research Areas
Authored Publications
Sort By
Plex: Towards Reliability using Pretrained Large Model Extensions
Du Phan
Mark Patrick Collier
Zi Wang
Zelda Mariet
Clara Huiyi Hu
Neil Band
Tim G. J. Rudner
Karan Singhal
Joost van Amersfoort
Andreas Christian Kirsch
Rodolphe Jenatton
Honglin Yuan
Kelly Buchanan
Yarin Gal
ICML 2022 Pre-training Workshop (2022)
Preview abstract
A recent trend in artificial intelligence (AI) is the use of pretrained models for language and vision tasks, which has achieved extraordinary performance but also puzzling failures. Examining tasks that probe the model’s abilities in diverse ways is therefore critical to the field. In this paper, we explore the \emph{reliability} of models, where we define a reliable model as one that not only achieves strong predictive performance but also performs well consistently over many decision-making tasks such as uncertainty (e.g., selective prediction, open set recognition), robust generalization (e.g., accuracy and scoring rules such as log-likelihood on in- and out-of-distribution datasets), and adaptation (e.g., active learning, few-shot learning). We devise 11 types of tasks over 36 datasets in order to evaluate different aspects of reliability on both vision and language domains. To improve reliability, we developed ViT-Plex and T5-Plex, \emph{p}retrained \emph{l}arge-model \emph{ex}tensions (henceforth abbreviated as \emph{plex}) for vision and language modalities. Plex greatly improves the state-of-the-art across tasks, and as a pretrained model Plex unifies the traditional protocol of designing and tuning one model for each reliability task. We demonstrate scaling effects over model sizes and pretraining dataset sizes up to 4 billion examples. We also demonstrate Plex’s capabilities on new tasks including zero-shot open set recognition, few-shot uncertainty, and uncertainty in conversational language understanding.
View details
Preview abstract
Developing robust NLP models that perform well on many, even small, slices of data is a significant but important challenge, with implications from fairness to general reliability. To this end, recent research has explored how models rely on spurious correlations, and how counterfactual data augmentation (CDA) can mitigate such issues. In this paper we study how and why modeling counterfactuals over multiple attributes can go significantly further in improving model performance. We propose RDI, a context-aware methodology which takes into account the impact of secondary attributes on the model’s predictions and increases sensitivity for secondary attributes over reweighted counterfactually augmented data. By implementing RDI in the context of toxicity detection, we find that accounting for secondary attributes can significantly improve robustness, with improvements in sliced accuracy on the original dataset up to 7% compared to existing robustness methods. We also demonstrate that RDI generalizes to the coreference resolution task and provide guidelines to extend this to other tasks.
View details
Practical Compositional Fairness: Understanding Fairness in Multi-Component Recommender Systems
Anu Aradhana Sinha
Alex Beutel
WSDM 2021
Preview abstract
Most literature in fairness has focused on improving fairness with respect to one single model or one single objective. However, real-world machine learning systems are usually composed of many different components. Unfortunately, recent research has shown that even if each component is "fair", the overall system can still be "unfair". In this paper, we focus on how well fairness composes over multiple components in real systems. We consider two recently proposed fairness metrics for rankings: exposure and pairwise ranking accuracy gap. We provide theory that demonstrates a set of conditions under which fairness of individual models does compose. We then present an analytical framework for both understanding whether a system's signals can achieve compositional fairness, and diagnosing which of these signals lowers the overall system's end-to-end fairness the most. Despite previously bleak theoretical results, on multiple data-sets -- including a large-scale real-world recommender system -- we find that the overall system's end-to-end fairness is largely achievable by improving fairness in individual components.
View details
Fairness without Demographics through Adversarially Reweighted Learning
Alex Beutel
Kang Lee
Advances in Neural Information Processing Systems 33 (2020)
Preview abstract
Much of the previous machine learning (ML) fairness literature assumes that protected features such as race and sex are present in the dataset, and relies upon them to mitigate fairness concerns. However, in practice factors like privacy and regulation often preclude the collection of protected features, or their use for training or inference, severely limiting the applicability of traditional fairness research. Therefore we ask: How can we train an ML model to improve fairness when we do not even know the protected group memberships? In this work we address this problem by proposing Adversarially Reweighted Learning (ARL). In particular, we hypothesize that non-protected features and task labels are valuable for identifying fairness issues, and can be used to co-train an adversarial reweighting approach for improving fairness. Our results show that ARL improves Rawlsian Max-Min fairness, with notable AUC improvements for worst-case protected groups in multiple datasets, outperforming state-of-the-art alternatives.
View details
Debiasing Embeddings for Fairer Text Classification
1st ACL Workshop on Gender Bias for Natural Language Processing (2019)
Preview abstract
(Bolukbasi et al., 2016) demonstrated that pre-trained word embeddings can inherit gender bias from the data they were trained on. We investigate how this bias affects downstream classification tasks, using the case study of occupation classification (De-Arteaga et al.,2019). We show that traditional techniques for debiasing embeddings can actually worsen the bias of the downstream classifier by providing a less noisy channel for communicating gender information. With a relatively minor adjustment, however, we show how these same techniques can be used to simultaneously reduce bias and obtain high classification accuracy.
View details
Nuanced Metrics for Measuring Unintended Bias with Real Data for Text Classification
Daniel Borkan
ACM Conference on Fairness, Accountability, and Transparency (2019) (to appear)
Preview abstract
Unintended bias in Machine Learning can manifest as systemic differences in performance for different demographic groups, potentially compounding existing challenges to fairness in society at large. In this paper, we introduce a suite of threshold-agnostic metrics that provide a nuanced view of this unintended bias, by considering the various ways that a classifier's score distribution can vary across designated groups. We also introduce a large new test set of online comments with crowd-sourced annotations for identity references. We use this to show how our metrics can be used to find new and potentially subtle unintended bias in existing public models.
View details
Conversations Gone Awry: Detecting Warning Signs of Conversational Failure
Justine Zhang
Jonathan P. Chang
Cristian Danescu-Niculescu-Mizil
Dario Taraborelli
Proceedings of ACL, ACM Digital Library (2018)
Preview abstract
One of the main challenges online social systems face today is
the prevalence of toxic behavior, such as harassment and personal
attacks. This type of antisocial behavior is especially perplexing and
disruptive when it emerges in the context of healthy conversations
where, at least in principle, participants share a common goal and set
of norms. In this work, we introduce the task of predicting whether a
given conversation is on the verge of being derailed by the antisocial
actions of one of its participants. As opposed to detecting toxic
behavior after the fact, this task aims to enable early, actionable
information at a time when the conversation might still be salvaged.
We focus on two methodological challenges. First, through a combination
of machine learning, crowd-sourcing and causal inference techniques
applied to a novel dataset of 8 million conversations,
we design a controlled setting that allows us to compare healthy
conversations that deteriorate with similar conversations that stay on
track, while accounting for confounding factors such as topical focus
and number of participants. Second, we propose a framework for
applying and evaluating linguistic, conversational and social patterns
in the task of predicting the future trajectory of a conversation.
Our primary result is that a simple model using conversational and
linguistic features can achieve performance close to that of humans
in predicting whether a civil conversation will go awry. We also show
that the conversational context is more informative in this task than
the history and experience of the participants. By demonstrating the
feasibility of the prediction task, and by providing a labeled dataset,
as well as a human baseline, we lay the ground for further work
on methods for detecting early warning signs, and for eventually
preventing, antisocial behavior in online discussions.
View details
Measuring and Mitigating Unintended Bias in Text Classification
John Li
AAAI/ACM Conference on AI, Ethics, and Society (2018)
Preview abstract
We introduce and illustrate a new approach to measuring and
mitigating unintended bias in machine learning models. Our
definition of unintended bias is parameterized by a test set
and a subset of input features. We illustrate how this can
be used to evaluate text classifiers using a synthetic test set
and a public corpus of comments annotated for toxicity from
Wikipedia Talk pages. We also demonstrate how imbalances
in training data can lead to unintended bias in the resulting
models, and therefore potentially unfair applications. We use
a set of common demographic identity terms as the subset of
input features on which we measure bias. This technique permits
analysis in the common scenario where demographic information
on authors and readers is unavailable, so that bias
mitigation must focus on the content of the text itself. The
mitigation method we introduce is an unsupervised approach
based on balancing the training dataset. We demonstrate that
this approach reduces the unintended bias without compromising
overall model quality
View details
WikiConv: A Corpus of the Complete Conversational History of a Large Online Collaborative Community
Cristian Danescu
Dario Taraborelli
Yiqing Hua
ACL (2018), pp. 5
Preview abstract
We present a corpus that encompasses the complete history of conversations between contributors of English Wikipedia, one of the largest online collaborative communities.
By recording the intermediate states of conversations---including not only comments and replies, but also their modifications, deletions and restorations---this data offers an unprecedented view of online conversation.
This level of detail supports new research questions pertaining to the process (and challenges) of large-scale online collaboration.
We illustrate the corpus' potential with two case studies that highlight new perspectives on earlier work.
First, we explore how a person's conversational behavior depends on how they relate to the discussion venue.
Second, we show that community moderation of toxic behavior happens at a higher rate than previously estimated.
View details
Ex Machina: Personal attacks seen at scale
Proceedings of the 26th International Conference on World Wide Web (2017), pp. 1391-1399
Preview abstract
The damage personal attacks cause to online discourse motivates many platforms to try to curb the phenomenon. However, understanding the prevalence and impact of personal attacks in online platforms at scale remains surprisingly difficult. The contribution of this paper is to develop and illustrate a method that combines crowdsourcing and machine learning to analyze personal attacks at scale. We show an evaluation method for a classifier in terms of the aggregated number of crowd-workers it can approximate. We apply our methodology to English Wikipedia, generating a corpus of over 100k high quality human-labeled comments and 63M machine-labeled ones from a classifier that is as good as the aggregate of 3 crowd-workers, as measured by the area under the ROC curve and Spearman correlation. Using this corpus of machine-labeled scores, our methodology allows us to explore some of the open questions about the nature of online personal attacks. This reveals that the majority of personal attacks on Wikipedia are not the result of a few malicious users, nor primarily the consequence of allowing anonymous contributions from unregistered users.
View details