Nithum Thain
Research Areas
Authored Publications
Sort By
Preview abstract
Large language models (LLMs) are highly capable at a variety of tasks given the right prompt, but writing one is still a difficult and tedious process. In this work, we introduce ConstitutionalExperts, a method for learning a prompt consisting of constitutional principles (i.e. rules), given a training dataset. Unlike prior methods that optimize the prompt as a single entity, our method incrementally improves the prompt by surgically editing individual principles. We also show that we can improve overall performance by learning unique prompts for different semantic regions of the training data and using a mixture-of-experts (MoE) architecture to route inputs at inference time. We compare ConstitutionalExperts to other state of the art prompt-optimization techniques across six benchmark datasets. We also investigate whether MoE improves these other techniques. Our results suggest that ConstitutionalExperts outperforms other prompt optimization techniques and that mixture-of-experts improves all techniques on average by 4.7%, suggesting broader applicability.
View details
Preview abstract
An Explorable explaining the concept of patchoscopes for an external audience. Patchoscopes is an interpretability tool that allows researchers to better understand an LLMs output representations through natural language experiments.
View details
Preview abstract
An Explorable explaining the concept of patchoscopes for an external audience. Patchoscopes is an interpretability tool that allows researchers to better understand an LLMs output representations through natural language experiments.
View details
Plex: Towards Reliability using Pretrained Large Model Extensions
Dustin Tran
Du Phan
Mark Patrick Collier
Zi Wang
Zelda Mariet
Clara Huiyi Hu
Neil Band
Tim G. J. Rudner
Karan Singhal
Joost van Amersfoort
Andreas Christian Kirsch
Rodolphe Jenatton
Honglin Yuan
Kelly Buchanan
D. Sculley
Yarin Gal
ICML 2022 Pre-training Workshop (2022)
Preview abstract
A recent trend in artificial intelligence (AI) is the use of pretrained models for language and vision tasks, which has achieved extraordinary performance but also puzzling failures. Examining tasks that probe the model’s abilities in diverse ways is therefore critical to the field. In this paper, we explore the \emph{reliability} of models, where we define a reliable model as one that not only achieves strong predictive performance but also performs well consistently over many decision-making tasks such as uncertainty (e.g., selective prediction, open set recognition), robust generalization (e.g., accuracy and scoring rules such as log-likelihood on in- and out-of-distribution datasets), and adaptation (e.g., active learning, few-shot learning). We devise 11 types of tasks over 36 datasets in order to evaluate different aspects of reliability on both vision and language domains. To improve reliability, we developed ViT-Plex and T5-Plex, \emph{p}retrained \emph{l}arge-model \emph{ex}tensions (henceforth abbreviated as \emph{plex}) for vision and language modalities. Plex greatly improves the state-of-the-art across tasks, and as a pretrained model Plex unifies the traditional protocol of designing and tuning one model for each reliability task. We demonstrate scaling effects over model sizes and pretraining dataset sizes up to 4 billion examples. We also demonstrate Plex’s capabilities on new tasks including zero-shot open set recognition, few-shot uncertainty, and uncertainty in conversational language understanding.
View details
Preview abstract
Developing robust NLP models that perform well on many, even small, slices of data is a significant but important challenge, with implications from fairness to general reliability. To this end, recent research has explored how models rely on spurious correlations, and how counterfactual data augmentation (CDA) can mitigate such issues. In this paper we study how and why modeling counterfactuals over multiple attributes can go significantly further in improving model performance. We propose RDI, a context-aware methodology which takes into account the impact of secondary attributes on the model’s predictions and increases sensitivity for secondary attributes over reweighted counterfactually augmented data. By implementing RDI in the context of toxicity detection, we find that accounting for secondary attributes can significantly improve robustness, with improvements in sliced accuracy on the original dataset up to 7% compared to existing robustness methods. We also demonstrate that RDI generalizes to the coreference resolution task and provide guidelines to extend this to other tasks.
View details
Practical Compositional Fairness: Understanding Fairness in Multi-Component Recommender Systems
Anu Aradhana Sinha
Alex Beutel
WSDM 2021
Preview abstract
Most literature in fairness has focused on improving fairness with respect to one single model or one single objective. However, real-world machine learning systems are usually composed of many different components. Unfortunately, recent research has shown that even if each component is "fair", the overall system can still be "unfair". In this paper, we focus on how well fairness composes over multiple components in real systems. We consider two recently proposed fairness metrics for rankings: exposure and pairwise ranking accuracy gap. We provide theory that demonstrates a set of conditions under which fairness of individual models does compose. We then present an analytical framework for both understanding whether a system's signals can achieve compositional fairness, and diagnosing which of these signals lowers the overall system's end-to-end fairness the most. Despite previously bleak theoretical results, on multiple data-sets -- including a large-scale real-world recommender system -- we find that the overall system's end-to-end fairness is largely achievable by improving fairness in individual components.
View details
Fairness without Demographics through Adversarially Reweighted Learning
Alex Beutel
Kang Lee
Advances in Neural Information Processing Systems 33 (2020)
Preview abstract
Much of the previous machine learning (ML) fairness literature assumes that protected features such as race and sex are present in the dataset, and relies upon them to mitigate fairness concerns. However, in practice factors like privacy and regulation often preclude the collection of protected features, or their use for training or inference, severely limiting the applicability of traditional fairness research. Therefore we ask: How can we train an ML model to improve fairness when we do not even know the protected group memberships? In this work we address this problem by proposing Adversarially Reweighted Learning (ARL). In particular, we hypothesize that non-protected features and task labels are valuable for identifying fairness issues, and can be used to co-train an adversarial reweighting approach for improving fairness. Our results show that ARL improves Rawlsian Max-Min fairness, with notable AUC improvements for worst-case protected groups in multiple datasets, outperforming state-of-the-art alternatives.
View details
Debiasing Embeddings for Fairer Text Classification
Tolga Bolukbasi
1st ACL Workshop on Gender Bias for Natural Language Processing (2019)
Preview abstract
(Bolukbasi et al., 2016) demonstrated that pre-trained word embeddings can inherit gender bias from the data they were trained on. We investigate how this bias affects downstream classification tasks, using the case study of occupation classification (De-Arteaga et al.,2019). We show that traditional techniques for debiasing embeddings can actually worsen the bias of the downstream classifier by providing a less noisy channel for communicating gender information. With a relatively minor adjustment, however, we show how these same techniques can be used to simultaneously reduce bias and obtain high classification accuracy.
View details
Nuanced Metrics for Measuring Unintended Bias with Real Data for Text Classification
Daniel Borkan
ACM Conference on Fairness, Accountability, and Transparency (2019) (to appear)
Preview abstract
Unintended bias in Machine Learning can manifest as systemic differences in performance for different demographic groups, potentially compounding existing challenges to fairness in society at large. In this paper, we introduce a suite of threshold-agnostic metrics that provide a nuanced view of this unintended bias, by considering the various ways that a classifier's score distribution can vary across designated groups. We also introduce a large new test set of online comments with crowd-sourced annotations for identity references. We use this to show how our metrics can be used to find new and potentially subtle unintended bias in existing public models.
View details
Measuring and Mitigating Unintended Bias in Text Classification
John Li
AAAI/ACM Conference on AI, Ethics, and Society (2018)
Preview abstract
We introduce and illustrate a new approach to measuring and
mitigating unintended bias in machine learning models. Our
definition of unintended bias is parameterized by a test set
and a subset of input features. We illustrate how this can
be used to evaluate text classifiers using a synthetic test set
and a public corpus of comments annotated for toxicity from
Wikipedia Talk pages. We also demonstrate how imbalances
in training data can lead to unintended bias in the resulting
models, and therefore potentially unfair applications. We use
a set of common demographic identity terms as the subset of
input features on which we measure bias. This technique permits
analysis in the common scenario where demographic information
on authors and readers is unavailable, so that bias
mitigation must focus on the content of the text itself. The
mitigation method we introduce is an unsupervised approach
based on balancing the training dataset. We demonstrate that
this approach reduces the unintended bias without compromising
overall model quality
View details