Nicholas Carlini
Research Areas
Authored Publications
Sort By
Identifying and Mitigating the Security Risks of Generative AI
Clark Barrett
Brad Boyd
Brad Chen
Jihye Choi
Amrita Roy Chowdhury
Anupam Datta
Soheil Feizi
Kathleen Fisher
Tatsunori B. Hashimoto
Dan Hendrycks
Somesh Jha
Daniel Kang
Florian Kerschbaum
Eric Mitchell
John Mitchell
Zulfikar Ramzan
Khawaja Shams
Dawn Song
Ankur Taly
Diyi Yang
Foundations and Trends in Privacy and Security, 6 (2023), pp. 1-52
Preview abstract
Every major technical invention resurfaces the dual-use dilemma—the new technology has the potential to be used for good as well as for harm. Generative AI (GenAI) techniques, such as large language models (LLMs) and diffusion models, have shown remarkable capabilities (e.g., in-context learning, code-completion, and text-to-image generation and editing). However, GenAI can be used just as well by attackers to generate new attacks and increase
the velocity and efficacy of existing attacks. This paper reports the findings of a workshop held at Google (co-organized by Stanford University and the University of Wisconsin-Madison) on the dual-use dilemma posed by GenAI. This paper is not meant to be comprehensive,
and reports on some of the interesting findings from the workshop. We discuss short-term and long-term goals for the community on this topic. We hope this paper provides a launching point on this important topic and provides interesting problems that the research community can work to address.
View details
Deduplicating Training Data Makes Language Models Better
Andrew Nystrom
Chiyuan Zhang
Chris Callison-Burch
(2022) (to appear)
Preview abstract
As large language models scale up, researchers and engineers have chosen to use larger datasets of loosely-filtered internet text instead of curated texts.
We find that existing NLP datasets are highly repetitive and contain duplicated examples.
For example, there is an example in the training dataset C4 that has over 200,000 near duplicates.
As a whole, we find that 1.68% of the C4 are near-duplicates.
Worse, we find a 1% overlap between the training and testing sets in these datasets.
Duplicate examples in training data inappropriately biases the distribution of rare/common sequences.
Models trained with non-deduplicated datasets are more likely to generate ``memorized" examples.
Additionally, if those models are used for downstream applications, such as scoring likelihoods of given sequences, we find that models trained on non-deduplicated and deduplicated datasets have a difference in accuracy of on average TODO.
View details
Preview abstract
We extend semi-supervised learning to the problem of domain adaptation to learn significantly higher-accuracy models that train on one data distribution and test on a different one. With the goal of generality, we introduce AdaMatch, a method that unifies the tasks of unsupervised domain adaptation (UDA), semi-supervised learning (SSL), and semi-supervised domain adaptation (SSDA). In an extensive experimental study, we compare its behavior with respective state-of-the-art techniques from SSL, SSDA, and UDA on vision classification tasks. We find AdaMatch either matches or significantly exceeds the state-of-the-art in each case using the same hyper-parameters regardless of the dataset or task. For example, AdaMatch nearly doubles the accuracy compared to that of the prior state-of-the-art on the UDA task for DomainNet and even exceeds the accuracy of the prior state-of-the-art obtained with pre-training by 6.4% when AdaMatch is trained completely from scratch. Furthermore, by providing AdaMatch with just one labeled example per class from the target domain (i.e., the SSDA setting), we increase the target accuracy by an additional 6.1%, and with 5 labeled examples, by 13.6%.
View details
Poisoning the Unlabeled Dataset of Semi-Supervised Learning
USENIX Security (2021)
Preview abstract
Semi-supervised machine learning models learn from a (small) set of labeled training examples, and a (large) set of unlabeled training examples. State-of-the-art models can reach within a few percentage points of fully-supervised training, while requiring 100x less labeled data.
We study a new class of vulnerabilities: poisoning attacks that modify the unlabeled dataset. In order to be useful, unlabeled datasets are given strictly less review than labeled datasets, and adversaries can therefore poison them easily. By inserting maliciously-crafted unlabeled examples totaling just 0.1% of the dataset size, we can manipulate a model trained on this poisoned dataset to misclassify arbitrary examples at test time (as any desired label). Our attacks are highly effective across datasets and semi-supervised learning methods.
We find that more accurate methods (thus more likely to be used) are significantly more vulnerable to poisoning attacks, and as such better training methods are unlikely to prevent this attack. To counter this we explore the space of defenses, and propose two methods that mitigate our attack.
View details
FixMatch: Simplifying Semi-Supervised Learning with Consistency and Confidence
Chun-Liang Li
Colin Raffel
David Berthelot
Han Zhang
Kihyuk Sohn
NeurIPS (2020) (to appear)
Preview abstract
Semi-supervised learning (SSL) provides an effective means of leveraging unlabeled data to improve a model’s performance. This domain has seen fast progress recently, at the cost of requiring more complex methods. In this paper we proposeFixMatch, an algorithm that is a significant simplification of existing SSL methods.FixMatch first generates pseudo-labels using the model’s predictions on weakly-augmented unlabeled images. For a given image, the pseudo-label is only retained if the model produces a high-confidence prediction. The model is then trained to predict the pseudo-label when fed a strongly-augmented version of the same image. Despite its simplicity, we show that FixMatch achieves state-of-the-art performance across a variety of standard semi-supervised learning benchmarks, including 94.93% accuracy on CIFAR-10 with 250 labels and 88.61% accuracy with 40 – just 4 labels per class. We carry out an extensive ablation study to tease apart the experimental factors that are most important to FixMatch’s success
View details
ReMixMatch: Semi-Supervised Learning with Distribution Matching and Augmentation Anchoring
Colin Raffel
David Berthelot
Han Zhang
Kihyuk Sohn
ICLR, ICLR (2020)
Preview abstract
We improve the recently-proposed ``MixMatch'' semi-supervised learning algorithm by introducing two new techniques: distribution alignment and augmentation anchoring. Distribution alignment encourages the marginal distribution of predictions on unlabeled data to be close to the marginal distribution of groundtruth labels. Augmentation anchoring feeds multiple strongly augmented versions of an input into the model and encourages each output to be close to the prediction for a weakly-augmented version of the same input. To produce strong augmentations, we propose a variant of AutoAugment which learns the augmentation policy while the model is being trained. Our new algorithm, dubbed ReMixMatch, is significantly more data-efficient than prior work, requiring between 5x and 16x less data to reach the same accuracy. For example, on CIFAR10 with 250 labeled examples we reach 93.73% accuracy (compared to MixMatch’s accuracy of 93.58% with 4,000 examples) and a median accuracy of 84.92% with just four labels per class. We make our code and data open-source at https://github.com/google-research/remixmatch.
View details
MixMatch: A Holistic Approach to Semi-Supervised Learning
David Berthelot
Ian Goodfellow
Avital Oliver
Colin Raffel
NeurIPS (2019) (to appear)
Preview abstract
Semi-supervised learning has proven to be a powerful paradigm for leveraging
unlabeled data to mitigate the reliance on large labeled datasets. In this work, we
unify the current dominant approaches for semi-supervised learning to produce a
new algorithm called MixMatch. MixMatch works by guessing low-entropy la-
bels for data-augmented unlabeled examples, and then mixes labeled and unlabeled
data using MixUp. We show that MixMatch obtains state-of-the-art results by a
large margin across many datasets and labeled data amounts. We also demonstrate
how MixMatch can help achieve a dramatically better accuracy-privacy trade-off
for differential privacy. Finally, we perform an ablation study to tease apart which
components of MixMatch are most important for its success.
View details
Preview abstract
This paper describes a testing methodology for quantitatively assessing the risk of \emph{unintended memorization} of rare or unique sequences in generative sequence models---a common type of neural network. Such models are sometimes trained on sensitive data (e.g., the text of users' private messages); our methodology allows deep-learning to choose configurations that minimize memorization during training, thereby benefiting privacy.
In experiments, we show that unintended memorization is a persistent, hard-to-avoid issue that can have serious consequences. Specifically, if not addressed during training, we show that new, efficient procedures can allow extracting unique, secret sequences such as credit card numbers from trained models. We also show that our testing strategy is practical and easy-to-apply, e.g., by describing its use for quantitatively preventing data exposure in a production, commercial neural network---a predictive email-composition assistant trained on millions of users' email messages.
View details