Lucy Vasserman
Lucy is a Software Engineer working on Jigsaw's Conversation AI project, which uses machine learning to spot abuse and harassment online. Previously, Lucy worked on language modeling for speech recognition.
Research Areas
Authored Publications
Sort By
Preview abstract
Online harassment is a major societal challenge that impacts multiple communities. Some members of community, like female journalists and activists, bear significantly higher impacts since their
profession requires easy accessibility, transparency about their identity, and involves highlighting stories of injustice. Through
a multi-phased qualitative research study involving a focus group
and interviews with 27 female journalists and activists, we mapped
the journey of a target who goes through harassment. We introduce
PMCR framework, as a way to focus on needs for Prevention, Monitoring, Crisis and Recovery. We focused on Crisis and Recovery, and
designed a tool to satisfy a target’s needs related to documenting
evidence of harassment during the crisis and creating reports that
could be shared with support networks for recovery. Finally, we
discuss users’ feedback to this tool, highlighting needs for targets as
they face the burden and offer recommendations to future designers
and scholars on how to develop tools that can help targets manage
their harassment.
View details
A New Generation of Perspective API: Efficient Multilingual Character-level Transformers
Alyssa Whitlock Lees
Yi Tay
Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (2022)
Preview abstract
On the world wide web, toxic content detectors are a crucial line ofdefense against potentially hateful and offensive messages. As such,building highly effective classifiers that enable a safer internet is animportant research area. Moreover, the web is a highly multilingual,cross-cultural community that develops its own lingo over time.As such, developing models that can be effective across a diverserange of languages usages and styles is crucial. In this paper, wepresent Jigsaw Perspective API’s new generation of toxic contentclassifiers which takes a step towards this unified vision. At theheart of the approach is a single multilingual token-free Charformermodel that is applicable across languages, domains, and tasks. Wedemonstrate that by forgoing static vocabularies, we gain flexibilityacross a variety of settings. We additionally outline the techniquesemployed to make such a byte-level model efficient and feasible forproductionization. Through extensive experiments on multilingualtoxic comment classification benchmarks derived from real API traffic and evaluation on an array of code-switching, covert toxicity,emoji-based hate, human-readable obfuscation, distribution shift,and bias evaluation settings, we show that our proposed approachoutperforms strong baselines. Finally, we present our findings ofdeploying this system in production, and discuss our observedbenefits over traditional approaches
View details
Preview abstract
Machine learning models are commonly used to detect toxicity in online conversations. These models are trained on datasets annotated by human raters. We explore how raters' self-described identities impact how they annotate toxicity in online comments. We first define the concept of specialized rater pools: rater pools formed based on raters' self-described identities, rather than at random. We formed three such rater pools for this study--specialized rater pools of raters from the U.S. who identify as African American, LGBTQ, and those who identify as neither. Each of these rater pools annotated the same set of comments, which contains many references to these identity groups. We found that rater identity is a statistically significant factor in how raters will annotate toxicity for identity-related annotations. Using preliminary content analysis, we examined the comments with the most disagreement between rater pools and found nuanced differences in the toxicity annotations. Next, we trained models on the annotations from each of the different rater pools, and compared the scores of these models on comments from several test sets. Finally, we discuss how using raters that self-identify with the subjects of comments can create more inclusive machine learning models, and provide more nuanced ratings than those by random raters.
View details
Preview abstract
Content moderation is often performed by a collaboration between humans and machine learning models. The machine learning models used in this collaboration are typically evaluated using metrics like accuracy or AUROC. However, such metrics do not capture the performance of the combined moderator-model system. Here, we introduce metrics analogous to accuracy and AUC that describe the overall system performance under constraints on human review bandwidth, and that quantify how efficiently and effectively these systems make use of human decision-making. We evaluate the performance of several models using these new metrics as well as existing ones under different review policies (the order in which moderators review comments from the model), finding that simple uncertainty-based review policies outperform traditional toxicity-based ones across a range of human bandwidths. Our results demonstrate the importance of metrics capturing the collaborative nature of the moderator-model system for this task, as well as the utility of uncertainty estimation for the content moderation problem.
View details
Nuanced Metrics for Measuring Unintended Bias with Real Data for Text Classification
Daniel Borkan
ACM Conference on Fairness, Accountability, and Transparency (2019) (to appear)
Preview abstract
Unintended bias in Machine Learning can manifest as systemic differences in performance for different demographic groups, potentially compounding existing challenges to fairness in society at large. In this paper, we introduce a suite of threshold-agnostic metrics that provide a nuanced view of this unintended bias, by considering the various ways that a classifier's score distribution can vary across designated groups. We also introduce a large new test set of online comments with crowd-sourced annotations for identity references. We use this to show how our metrics can be used to find new and potentially subtle unintended bias in existing public models.
View details
Model Cards for Model Reporting
Elena Spitzer
Inioluwa Deborah Raji
M. Mitchell
Simone Sanoian McCloskey Wu
Timnit Gebru
(2019)
Preview abstract
Trained machine learning models are increasingly used to perform high impact tasks such as determining crime recidivism rates and predicting health risks. In order to clarify the intended use cases of machine learning models and minimize their usage in contexts they are not well-suited for, we recommend that released models be accompanied by documentation detailing their performance characteristics. In this paper, we propose a framework that we call model cards (or M-cards) to encourage such transparent model reporting. Model cards are short documents accompanying trained machine learning models that provide benchmarked evaluation in a variety of conditions, such as across different cultural, demographic, or phenotypic subgroups (e.g., race, geographic location, sex, Fitzpatrick skin tone) and intersectional subgroups (e.g., age and race, or sex and Fitzpatrick skin tone) that are relevant to the intended application domains. Model cards also disclose the context under which models are intended to be used, details of the performance evaluation procedures, and other relevant information. While we focus primarily on human-centered machine learning models in the application fields of computer vision and natural language processing, this framework can be used to document any trained machine learning model. To solidify the concept, we provide cards for models trained to detect smiling faces on the CelebA dataset (Liu et al., 2015) and models trained to detect toxicity in the Conversation AI dataset (Dixon et al., 2018). We propose this work as a step towards the responsible democratization of machine learning and related AI technology, providing context around machine learning models and increasing the transparency into how well such models work. We hope this work encourages those releasing trained machine learning models to accompany model releases with similar detailed documentation.
View details
Measuring and Mitigating Unintended Bias in Text Classification
John Li
AAAI/ACM Conference on AI, Ethics, and Society (2018)
Preview abstract
We introduce and illustrate a new approach to measuring and
mitigating unintended bias in machine learning models. Our
definition of unintended bias is parameterized by a test set
and a subset of input features. We illustrate how this can
be used to evaluate text classifiers using a synthetic test set
and a public corpus of comments annotated for toxicity from
Wikipedia Talk pages. We also demonstrate how imbalances
in training data can lead to unintended bias in the resulting
models, and therefore potentially unfair applications. We use
a set of common demographic identity terms as the subset of
input features on which we measure bias. This technique permits
analysis in the common scenario where demographic information
on authors and readers is unavailable, so that bias
mitigation must focus on the content of the text itself. The
mitigation method we introduce is an unsupervised approach
based on balancing the training dataset. We demonstrate that
this approach reduces the unintended bias without compromising
overall model quality
View details
Contextual Language Model Adaptation Using Dynamic Classes
Benjamin Haynor
IEEE Workshop on Spoken Language Technology (SLT), IEEE (2016)
Preview abstract
Recent focus on assistant products has increased the need for extremely
flexible speech systems that adapt
well to specific users' needs. An important aspect of this is enabling users to
make voice commands referencing their own personal data, such as favorite songs,
application names, and contacts. Recognition accuracy for common commands such
as playing music and sending text messages can be greatly improved if we know a
user's preferences.
In the past, we have addressed this problem using class-based language models
that allow for query-time injection of class instances. However, this approach
is limited by the need to train class-based models ahead of time.
In this work, we present a significantly more flexible system for query-time
injection of user context. Our system dynamically injects the classes
into a non-class-based language model. We remove the need to select the classes
at language model training time. Instead, our system can vary the classes on a
per-client, per-use case, or even a per-request basis.
With the ability to inject new classes per-request outlined in this work, our
speech system can support a diverse set of use cases by
taking advantage of a wide range of contextual information specific to each
use case.
View details
Sequence-based Class Tagging for Robust Transcription in ASR
Preview
Vlad Schogol
Keith Hall
Interspeech 2015, International Speech Communications Association (to appear)