Nicholas Carlini

Nicholas Carlini

Authored Publications
Sort By
  • Title
  • Title, descending
  • Year
  • Year, descending
    Identifying and Mitigating the Security Risks of Generative AI
    Clark Barrett
    Brad Boyd
    Brad Chen
    Jihye Choi
    Amrita Roy Chowdhury
    Anupam Datta
    Soheil Feizi
    Kathleen Fisher
    Tatsunori B. Hashimoto
    Dan Hendrycks
    Somesh Jha
    Daniel Kang
    Florian Kerschbaum
    Eric Mitchell
    John Mitchell
    Zulfikar Ramzan
    Khawaja Shams
    Dawn Song
    Ankur Taly
    Diyi Yang
    Foundations and Trends in Privacy and Security, 6 (2023), pp. 1-52
    Preview abstract Every major technical invention resurfaces the dual-use dilemma—the new technology has the potential to be used for good as well as for harm. Generative AI (GenAI) techniques, such as large language models (LLMs) and diffusion models, have shown remarkable capabilities (e.g., in-context learning, code-completion, and text-to-image generation and editing). However, GenAI can be used just as well by attackers to generate new attacks and increase the velocity and efficacy of existing attacks. This paper reports the findings of a workshop held at Google (co-organized by Stanford University and the University of Wisconsin-Madison) on the dual-use dilemma posed by GenAI. This paper is not meant to be comprehensive, and reports on some of the interesting findings from the workshop. We discuss short-term and long-term goals for the community on this topic. We hope this paper provides a launching point on this important topic and provides interesting problems that the research community can work to address. View details
    Preview abstract As large language models scale up, researchers and engineers have chosen to use larger datasets of loosely-filtered internet text instead of curated texts. We find that existing NLP datasets are highly repetitive and contain duplicated examples. For example, there is an example in the training dataset C4 that has over 200,000 near duplicates. As a whole, we find that 1.68% of the C4 are near-duplicates. Worse, we find a 1% overlap between the training and testing sets in these datasets. Duplicate examples in training data inappropriately biases the distribution of rare/common sequences. Models trained with non-deduplicated datasets are more likely to generate ``memorized" examples. Additionally, if those models are used for downstream applications, such as scoring likelihoods of given sequences, we find that models trained on non-deduplicated and deduplicated datasets have a difference in accuracy of on average TODO. View details
    Preview abstract We extend semi-supervised learning to the problem of domain adaptation to learn significantly higher-accuracy models that train on one data distribution and test on a different one. With the goal of generality, we introduce AdaMatch, a method that unifies the tasks of unsupervised domain adaptation (UDA), semi-supervised learning (SSL), and semi-supervised domain adaptation (SSDA). In an extensive experimental study, we compare its behavior with respective state-of-the-art techniques from SSL, SSDA, and UDA on vision classification tasks. We find AdaMatch either matches or significantly exceeds the state-of-the-art in each case using the same hyper-parameters regardless of the dataset or task. For example, AdaMatch nearly doubles the accuracy compared to that of the prior state-of-the-art on the UDA task for DomainNet and even exceeds the accuracy of the prior state-of-the-art obtained with pre-training by 6.4% when AdaMatch is trained completely from scratch. Furthermore, by providing AdaMatch with just one labeled example per class from the target domain (i.e., the SSDA setting), we increase the target accuracy by an additional 6.1%, and with 5 labeled examples, by 13.6%. View details
    Preview abstract Semi-supervised machine learning models learn from a (small) set of labeled training examples, and a (large) set of unlabeled training examples. State-of-the-art models can reach within a few percentage points of fully-supervised training, while requiring 100x less labeled data. We study a new class of vulnerabilities: poisoning attacks that modify the unlabeled dataset. In order to be useful, unlabeled datasets are given strictly less review than labeled datasets, and adversaries can therefore poison them easily. By inserting maliciously-crafted unlabeled examples totaling just 0.1% of the dataset size, we can manipulate a model trained on this poisoned dataset to misclassify arbitrary examples at test time (as any desired label). Our attacks are highly effective across datasets and semi-supervised learning methods. We find that more accurate methods (thus more likely to be used) are significantly more vulnerable to poisoning attacks, and as such better training methods are unlikely to prevent this attack. To counter this we explore the space of defenses, and propose two methods that mitigate our attack. View details
    Preview abstract Semi-supervised learning (SSL) provides an effective means of leveraging unlabeled data to improve a model’s performance. This domain has seen fast progress recently, at the cost of requiring more complex methods. In this paper we proposeFixMatch, an algorithm that is a significant simplification of existing SSL methods.FixMatch first generates pseudo-labels using the model’s predictions on weakly-augmented unlabeled images. For a given image, the pseudo-label is only retained if the model produces a high-confidence prediction. The model is then trained to predict the pseudo-label when fed a strongly-augmented version of the same image. Despite its simplicity, we show that FixMatch achieves state-of-the-art performance across a variety of standard semi-supervised learning benchmarks, including 94.93% accuracy on CIFAR-10 with 250 labels and 88.61% accuracy with 40 – just 4 labels per class. We carry out an extensive ablation study to tease apart the experimental factors that are most important to FixMatch’s success View details
    Preview abstract We improve the recently-proposed ``MixMatch'' semi-supervised learning algorithm by introducing two new techniques: distribution alignment and augmentation anchoring. Distribution alignment encourages the marginal distribution of predictions on unlabeled data to be close to the marginal distribution of groundtruth labels. Augmentation anchoring feeds multiple strongly augmented versions of an input into the model and encourages each output to be close to the prediction for a weakly-augmented version of the same input. To produce strong augmentations, we propose a variant of AutoAugment which learns the augmentation policy while the model is being trained. Our new algorithm, dubbed ReMixMatch, is significantly more data-efficient than prior work, requiring between 5x and 16x less data to reach the same accuracy. For example, on CIFAR10 with 250 labeled examples we reach 93.73% accuracy (compared to MixMatch’s accuracy of 93.58% with 4,000 examples) and a median accuracy of 84.92% with just four labels per class. We make our code and data open-source at https://github.com/google-research/remixmatch. View details
    MixMatch: A Holistic Approach to Semi-Supervised Learning
    David Berthelot
    Ian Goodfellow
    Avital Oliver
    Colin Raffel
    NeurIPS (2019) (to appear)
    Preview abstract Semi-supervised learning has proven to be a powerful paradigm for leveraging unlabeled data to mitigate the reliance on large labeled datasets. In this work, we unify the current dominant approaches for semi-supervised learning to produce a new algorithm called MixMatch. MixMatch works by guessing low-entropy la- bels for data-augmented unlabeled examples, and then mixes labeled and unlabeled data using MixUp. We show that MixMatch obtains state-of-the-art results by a large margin across many datasets and labeled data amounts. We also demonstrate how MixMatch can help achieve a dramatically better accuracy-privacy trade-off for differential privacy. Finally, we perform an ablation study to tease apart which components of MixMatch are most important for its success. View details
    The Secret Sharer: Evaluating and Testing Unintended Memorization in Neural Networks
    Chang Liu
    Jernej Kos
    Úlfar Erlingsson
    Dawn Song
    USENIX Security (2019) (to appear)
    Preview abstract This paper describes a testing methodology for quantitatively assessing the risk of \emph{unintended memorization} of rare or unique sequences in generative sequence models---a common type of neural network. Such models are sometimes trained on sensitive data (e.g., the text of users' private messages); our methodology allows deep-learning to choose configurations that minimize memorization during training, thereby benefiting privacy. In experiments, we show that unintended memorization is a persistent, hard-to-avoid issue that can have serious consequences. Specifically, if not addressed during training, we show that new, efficient procedures can allow extracting unique, secret sequences such as credit card numbers from trained models. We also show that our testing strategy is practical and easy-to-apply, e.g., by describing its use for quantitatively preventing data exposure in a production, commercial neural network---a predictive email-composition assistant trained on millions of users' email messages. View details