Lucy Vasserman

Lucy Vasserman

Lucy is a Software Engineer working on Jigsaw's Conversation AI project, which uses machine learning to spot abuse and harassment online. Previously, Lucy worked on language modeling for speech recognition.
Authored Publications
Sort By
  • Title
  • Title, descending
  • Year
  • Year, descending
    Preview abstract Machine learning models are commonly used to detect toxicity in online conversations. These models are trained on datasets annotated by human raters. We explore how raters' self-described identities impact how they annotate toxicity in online comments. We first define the concept of specialized rater pools: rater pools formed based on raters' self-described identities, rather than at random. We formed three such rater pools for this study--specialized rater pools of raters from the U.S. who identify as African American, LGBTQ, and those who identify as neither. Each of these rater pools annotated the same set of comments, which contains many references to these identity groups. We found that rater identity is a statistically significant factor in how raters will annotate toxicity for identity-related annotations. Using preliminary content analysis, we examined the comments with the most disagreement between rater pools and found nuanced differences in the toxicity annotations. Next, we trained models on the annotations from each of the different rater pools, and compared the scores of these models on comments from several test sets. Finally, we discuss how using raters that self-identify with the subjects of comments can create more inclusive machine learning models, and provide more nuanced ratings than those by random raters. View details
    A New Generation of Perspective API: Efficient Multilingual Character-level Transformers
    Alyssa Whitlock Lees
    Yi Tay
    Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (2022)
    Preview abstract On the world wide web, toxic content detectors are a crucial line ofdefense against potentially hateful and offensive messages. As such,building highly effective classifiers that enable a safer internet is animportant research area. Moreover, the web is a highly multilingual,cross-cultural community that develops its own lingo over time.As such, developing models that can be effective across a diverserange of languages usages and styles is crucial. In this paper, wepresent Jigsaw Perspective API’s new generation of toxic contentclassifiers which takes a step towards this unified vision. At theheart of the approach is a single multilingual token-free Charformermodel that is applicable across languages, domains, and tasks. Wedemonstrate that by forgoing static vocabularies, we gain flexibilityacross a variety of settings. We additionally outline the techniquesemployed to make such a byte-level model efficient and feasible forproductionization. Through extensive experiments on multilingualtoxic comment classification benchmarks derived from real API traffic and evaluation on an array of code-switching, covert toxicity,emoji-based hate, human-readable obfuscation, distribution shift,and bias evaluation settings, we show that our proposed approachoutperforms strong baselines. Finally, we present our findings ofdeploying this system in production, and discuss our observedbenefits over traditional approaches View details
    Preview abstract Online harassment is a major societal challenge that impacts multiple communities. Some members of community, like female journalists and activists, bear significantly higher impacts since their profession requires easy accessibility, transparency about their identity, and involves highlighting stories of injustice. Through a multi-phased qualitative research study involving a focus group and interviews with 27 female journalists and activists, we mapped the journey of a target who goes through harassment. We introduce PMCR framework, as a way to focus on needs for Prevention, Monitoring, Crisis and Recovery. We focused on Crisis and Recovery, and designed a tool to satisfy a target’s needs related to documenting evidence of harassment during the crisis and creating reports that could be shared with support networks for recovery. Finally, we discuss users’ feedback to this tool, highlighting needs for targets as they face the burden and offer recommendations to future designers and scholars on how to develop tools that can help targets manage their harassment. View details
    Preview abstract Content moderation is often performed by a collaboration between humans and machine learning models. The machine learning models used in this collaboration are typically evaluated using metrics like accuracy or AUROC. However, such metrics do not capture the performance of the combined moderator-model system. Here, we introduce metrics analogous to accuracy and AUC that describe the overall system performance under constraints on human review bandwidth, and that quantify how efficiently and effectively these systems make use of human decision-making. We evaluate the performance of several models using these new metrics as well as existing ones under different review policies (the order in which moderators review comments from the model), finding that simple uncertainty-based review policies outperform traditional toxicity-based ones across a range of human bandwidths. Our results demonstrate the importance of metrics capturing the collaborative nature of the moderator-model system for this task, as well as the utility of uncertainty estimation for the content moderation problem. View details
    Preview abstract Unintended bias in Machine Learning can manifest as systemic differences in performance for different demographic groups, potentially compounding existing challenges to fairness in society at large. In this paper, we introduce a suite of threshold-agnostic metrics that provide a nuanced view of this unintended bias, by considering the various ways that a classifier's score distribution can vary across designated groups. We also introduce a large new test set of online comments with crowd-sourced annotations for identity references. We use this to show how our metrics can be used to find new and potentially subtle unintended bias in existing public models. View details
    Model Cards for Model Reporting
    Elena Spitzer
    Inioluwa Deborah Raji
    M. Mitchell
    Simone Sanoian McCloskey Wu
    Timnit Gebru
    (2019)
    Preview abstract Trained machine learning models are increasingly used to perform high impact tasks such as determining crime recidivism rates and predicting health risks. In order to clarify the intended use cases of machine learning models and minimize their usage in contexts they are not well-suited for, we recommend that released models be accompanied by documentation detailing their performance characteristics. In this paper, we propose a framework that we call model cards (or M-cards) to encourage such transparent model reporting. Model cards are short documents accompanying trained machine learning models that provide benchmarked evaluation in a variety of conditions, such as across different cultural, demographic, or phenotypic subgroups (e.g., race, geographic location, sex, Fitzpatrick skin tone) and intersectional subgroups (e.g., age and race, or sex and Fitzpatrick skin tone) that are relevant to the intended application domains. Model cards also disclose the context under which models are intended to be used, details of the performance evaluation procedures, and other relevant information. While we focus primarily on human-centered machine learning models in the application fields of computer vision and natural language processing, this framework can be used to document any trained machine learning model. To solidify the concept, we provide cards for models trained to detect smiling faces on the CelebA dataset (Liu et al., 2015) and models trained to detect toxicity in the Conversation AI dataset (Dixon et al., 2018). We propose this work as a step towards the responsible democratization of machine learning and related AI technology, providing context around machine learning models and increasing the transparency into how well such models work. We hope this work encourages those releasing trained machine learning models to accompany model releases with similar detailed documentation. View details
    Preview abstract We introduce and illustrate a new approach to measuring and mitigating unintended bias in machine learning models. Our definition of unintended bias is parameterized by a test set and a subset of input features. We illustrate how this can be used to evaluate text classifiers using a synthetic test set and a public corpus of comments annotated for toxicity from Wikipedia Talk pages. We also demonstrate how imbalances in training data can lead to unintended bias in the resulting models, and therefore potentially unfair applications. We use a set of common demographic identity terms as the subset of input features on which we measure bias. This technique permits analysis in the common scenario where demographic information on authors and readers is unavailable, so that bias mitigation must focus on the content of the text itself. The mitigation method we introduce is an unsupervised approach based on balancing the training dataset. We demonstrate that this approach reduces the unintended bias without compromising overall model quality View details
    Contextual Language Model Adaptation Using Dynamic Classes
    Benjamin Haynor
    IEEE Workshop on Spoken Language Technology (SLT), IEEE (2016)
    Preview abstract Recent focus on assistant products has increased the need for extremely flexible speech systems that adapt well to specific users' needs. An important aspect of this is enabling users to make voice commands referencing their own personal data, such as favorite songs, application names, and contacts. Recognition accuracy for common commands such as playing music and sending text messages can be greatly improved if we know a user's preferences. In the past, we have addressed this problem using class-based language models that allow for query-time injection of class instances. However, this approach is limited by the need to train class-based models ahead of time. In this work, we present a significantly more flexible system for query-time injection of user context. Our system dynamically injects the classes into a non-class-based language model. We remove the need to select the classes at language model training time. Instead, our system can vary the classes on a per-client, per-use case, or even a per-request basis. With the ability to inject new classes per-request outlined in this work, our speech system can support a diverse set of use cases by taking advantage of a wide range of contextual information specific to each use case. View details
    Sequence-based Class Tagging for Robust Transcription in ASR
    Vlad Schogol
    Keith Hall
    Interspeech 2015, International Speech Communications Association (to appear)
    Preview