Jeffrey Sorensen

Jeffrey Sorensen

Authored Publications
Sort By
  • Title
  • Title, descending
  • Year
  • Year, descending
    Context Sensitivity Estimation in Toxicity Detection
    Alexandros Xenos
    Ioannis Pavlopoulos
    Ion Androutsopoulos
    First Monday (2022)
    Preview abstract Context-sensitive posts are rare in toxicity de-tection datasets. This fact leads to modelsthat disregard even the conversational context(e.g., the parent post) when they predict toxic-ity. This work introduces the task of context-sensitivity estimation in toxicity detection andpresents. We present and publicly release thefirst dataset that can be used to build context-sensitivity estimation systems.We furthershow that systems trained on our dataset canbe effectively used to detect posts that dependto the parent post, regarding toxicity detection. View details
    A New Generation of Perspective API: Efficient Multilingual Character-level Transformers
    Alyssa Whitlock Lees
    Yi Tay
    Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (2022)
    Preview abstract On the world wide web, toxic content detectors are a crucial line ofdefense against potentially hateful and offensive messages. As such,building highly effective classifiers that enable a safer internet is animportant research area. Moreover, the web is a highly multilingual,cross-cultural community that develops its own lingo over time.As such, developing models that can be effective across a diverserange of languages usages and styles is crucial. In this paper, wepresent Jigsaw Perspective API’s new generation of toxic contentclassifiers which takes a step towards this unified vision. At theheart of the approach is a single multilingual token-free Charformermodel that is applicable across languages, domains, and tasks. Wedemonstrate that by forgoing static vocabularies, we gain flexibilityacross a variety of settings. We additionally outline the techniquesemployed to make such a byte-level model efficient and feasible forproductionization. Through extensive experiments on multilingualtoxic comment classification benchmarks derived from real API traffic and evaluation on an array of code-switching, covert toxicity,emoji-based hate, human-readable obfuscation, distribution shift,and bias evaluation settings, we show that our proposed approachoutperforms strong baselines. Finally, we present our findings ofdeploying this system in production, and discuss our observedbenefits over traditional approaches View details
    Preview abstract Toxicity detection is of growing importance in social and other media to allow healthy discussions. Most previous work ignores the context of user posts, which can mislead systems and moderators to incorrectly classify toxic posts as non-toxic, or vice versa. Recent work concluded that datasets containing many more context-aware posts are needed to correctly train and evaluate context-aware toxicity classifiers. We re-annotated an existing toxicity dataset, adding context-aware ground truth to the existing context-unaware ground truth. Exploiting both types of ground truth, context aware and unaware, we develop and evaluate a classifier that can determine if a post is context-sensitive or not. The classifier can be used to collect more context-sensitive posts. It can also be used to determine when a moderator needs to consider the parent post (to decrease the moderation cost) or when a context-aware toxicity detection system has to be evoked, as opposed to using a simpler context-unaware system. We also discuss how the context-sensitivity classifier can help avoid a possibly malicious exploitation of the context-unawareness of current toxicity detectors. Datasets and code of models addressing this novel task will become publicly available. View details
    Jigsaw @ AMI and HaSpeeDe2: Fine-Tuning a Pre-TrainedComment-Domain BERT Model
    Alyssa Whitlock Lees
    Ian Kivlichan
    Proceedings of Seventh Evaluation Campaign of Natural Language Processing and Speech Tools for Italian. Final Workshop (EVALITA 2020), CEUR.org, Online (to appear)
    Preview abstract The Google Jigsaw team produced submissions for two of the EVALITA 2020 shared asks, based in part on the technology that powers the publicly available PerspectiveAPI comment evaluation service. We present a basic description of our submitted results and a review of the types of errors that our system made in these shared tasks. View details
    Preview abstract Unintended bias in Machine Learning can manifest as systemic differences in performance for different demographic groups, potentially compounding existing challenges to fairness in society at large. In this paper, we introduce a suite of threshold-agnostic metrics that provide a nuanced view of this unintended bias, by considering the various ways that a classifier's score distribution can vary across designated groups. We also introduce a large new test set of online comments with crowd-sourced annotations for identity references. We use this to show how our metrics can be used to find new and potentially subtle unintended bias in existing public models. View details
    Preview abstract We present a corpus that encompasses the complete history of conversations between contributors of English Wikipedia, one of the largest online collaborative communities. By recording the intermediate states of conversations---including not only comments and replies, but also their modifications, deletions and restorations---this data offers an unprecedented view of online conversation. This level of detail supports new research questions pertaining to the process (and challenges) of large-scale online collaboration. We illustrate the corpus' potential with two case studies that highlight new perspectives on earlier work. First, we explore how a person's conversational behavior depends on how they relate to the discussion venue. Second, we show that community moderation of toxic behavior happens at a higher rate than previously estimated. View details
    Preview abstract We introduce and illustrate a new approach to measuring and mitigating unintended bias in machine learning models. Our definition of unintended bias is parameterized by a test set and a subset of input features. We illustrate how this can be used to evaluate text classifiers using a synthetic test set and a public corpus of comments annotated for toxicity from Wikipedia Talk pages. We also demonstrate how imbalances in training data can lead to unintended bias in the resulting models, and therefore potentially unfair applications. We use a set of common demographic identity terms as the subset of input features on which we measure bias. This technique permits analysis in the common scenario where demographic information on authors and readers is unavailable, so that bias mitigation must focus on the content of the text itself. The mitigation method we introduce is an unsupervised approach based on balancing the training dataset. We demonstrate that this approach reduces the unintended bias without compromising overall model quality View details
    Unary Data Structures for Language Models
    Interspeech 2011, International Speech Communication Association, pp. 1425-1428
    Preview abstract Language models are important components of speech recognition and machine translation systems. Trained on billions of words, and consisting of billions of parameters, language models often are the single largest components of these systems. There have been many proposed techniques to reduce the storage requirements for language models. A technique based upon pointer-free compact storage of ordinal trees shows compression competitive with the best proposed systems, while retaining the full finite state structure, and without using computationally expensive block compression schemes or lossy quantization techniques. View details