Jump to Content
Lucas Dixon

Lucas Dixon

Lucas is a research scientist and lead of PAIR (People and AI Research). He works on visualisation, explainability and control of machine learning systems, and specifically language models. His work explores how people can productively and fairly benefit from machine learning systems. Previously, he was Chief Scientist at Jigsaw where he founded engineering and research. He has worked on a range of topics including security, formal logics, machine learning, and data visualization. For example he worked on uProxy & Outline, Project Shield, DigitalAttackMap; Syria Defection Tracker, unfiltered.news, Conversation AI and Perspective API. Before Google, Lucas completed his PhD and worked at the University of Edinburgh on the automation of mathematical reasoning and graphical languages mostly applied to quantum information. He also helped run a non-profit working towards more rational and informed discussion and decision making, and was a co-founder of TheoryMine - a playful take on automating mathematical discovery.
Authored Publications
Google Publications
Other Publications
Sort By
  • Title
  • Title, desc
  • Year
  • Year, desc
    Preview abstract Traditional recommender systems leverage users' item preference history to recommend novel content that users may like. However, dialog interfaces that allow users to express language-based preferences offer a fundamentally different modality for preference input. Inspired by recent successes of prompting paradigms for large language models (LLMs), we study their use for making recommendations from both item-based and language-based preferences in comparison to state-of-the-art item-based collaborative filtering (CF) methods. To support this investigation, we collect a new dataset consisting of both item-based and language-based preferences elicited from users along with their ratings on a variety of (biased) recommended items and (unbiased) random items. Among numerous experimental results, we find that LLMs provide competitive recommendation performance for pure language-based preferences (no item preferences) in the near cold-start case in comparison to item-based CF methods, despite having no supervised training for this specific task (zero-shot) or only a few labels (few-shot). This is particularly promising as language-based preference representations are more explainable and scrutable than item-based or vector-based representations. View details
    On Natural Language User Profiles for Transparent and Scrutable Recommendation
    Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR '22) (2022)
    Preview abstract Natural interaction with recommendation and personalized search systems has received tremendous attention in recent years. We focus on the challenge of supporting people's understanding and control of these systems and explore a fundamentally new way of thinking about representation of knowledge in recommendation and personalization systems. Specifically, we argue that it may be both desirable and possible for algorithms that use natural language representations of users' preferences to be developed. We make the case that this could provide significantly greater transparency, as well as affordances for practical actionable interrogation of, and control over, recommendations. Moreover, we argue that such an approach, if successfully applied, may enable a major step towards systems that rely less on noisy implicit observations while increasing portability of knowledge of one's interests. View details
    Sparsely Activated Language Models are Efficient In-Context Learners
    Barret Richard Zoph
    Dmitry (Dima) Lepikhin
    Emma Wang
    Kun Zhang
    Liam B. Fedus
    Maarten Paul Bosma
    Marie Pellat
    Maxim Krikun
    Nan Du
    Simon Tong
    Tao Wang
    Toju Duke
    Yuanzhong Xu
    Zongwei Zhou
    Preview abstract Scaling language models with more data, compute and parameters has driven significant progress in natural language processing. For example, thanks to scaling, GPT-3 was able to achieve strong performance on few-shot learning. However, training these large dense models require significant amounts of computing resources. In this paper, we develop a family of sparsely activated mixture-of-expert language models named \glam (\textbf{G}eneralist \textbf{La}nguage \textbf{M}odel), which can have many more parameters but require significant less training cost than dense models. The largest \glam has 1.2 trillion parameters, which is approximately 7x larger than GPT-3 but can be trained more efficiently. With only 1/3 of energy consumption to train GPT-3, \glam achieves better overall performance on 29 zero-shot and one-shot NLP tasks. For example, \glam gets 75.0\% one-shot exact match accuracy on the TriviaQA test server, a significant improvement over 68.0\% obtained by GPT-3. View details
    Beyond Rewards: a Hierarchical Perspective on Offline Multiagent Behavioral Analysis
    Shayegan Omidshafiei
    Yannick Assogba
    Advances in Neural Information Processing Systems (NeurIPS) (2022) (to appear)
    Preview abstract Each year, expert-level performance is attained in increasingly-complex multiagent domains, notable examples including Go, Poker, and StarCraft II. This rapid progression is accompanied by a commensurate need to better understand how such agents attain this performance, to enable their safe deployment, identify limitations, and reveal potential means of improving them. In this paper we take a step back from performance-focused multiagent learning, and instead turn our attention towards agent behavior analysis. We introduce a model-agnostic method for discovery of behavior clusters in multiagent domains, using variational inference to learn a hierarchy of behaviors at the joint and local agent levels. Our framework makes no assumption about agents' underlying learning algorithms, does not require access to their latent states or policies, and is trained using only offline observational data. We illustrate the effectiveness of our method for enabling the coupled understanding of behaviors at the joint and local agent level, detection of behavior changepoints throughout training, discovery of core behavioral concepts, demonstrate the approach's scalability to a high-dimensional multiagent MuJoCo control domain, and also illustrate that the approach can disentangle previously-trained policies in OpenAI's hide-and-seek domain. View details
    Context Sensitivity Estimation in Toxicity Detection
    Alexandros Xenos
    Ioannis Pavlopoulos
    Ion Androutsopoulos
    First Monday (2022)
    Preview abstract Context-sensitive posts are rare in toxicity de-tection datasets. This fact leads to modelsthat disregard even the conversational context(e.g., the parent post) when they predict toxic-ity. This work introduces the task of context-sensitivity estimation in toxicity detection andpresents. We present and publicly release thefirst dataset that can be used to build context-sensitivity estimation systems.We furthershow that systems trained on our dataset canbe effectively used to detect posts that dependto the parent post, regarding toxicity detection. View details
    Preview abstract Toxicity detection is of growing importance in social and other media to allow healthy discussions. Most previous work ignores the context of user posts, which can mislead systems and moderators to incorrectly classify toxic posts as non-toxic, or vice versa. Recent work concluded that datasets containing many more context-aware posts are needed to correctly train and evaluate context-aware toxicity classifiers. We re-annotated an existing toxicity dataset, adding context-aware ground truth to the existing context-unaware ground truth. Exploiting both types of ground truth, context aware and unaware, we develop and evaluate a classifier that can determine if a post is context-sensitive or not. The classifier can be used to collect more context-sensitive posts. It can also be used to determine when a moderator needs to consider the parent post (to decrease the moderation cost) or when a context-aware toxicity detection system has to be evoked, as opposed to using a simpler context-unaware system. We also discuss how the context-sensitivity classifier can help avoid a possibly malicious exploitation of the context-unawareness of current toxicity detectors. Datasets and code of models addressing this novel task will become publicly available. View details
    Preview abstract Unintended bias in Machine Learning can manifest as systemic differences in performance for different demographic groups, potentially compounding existing challenges to fairness in society at large. In this paper, we introduce a suite of threshold-agnostic metrics that provide a nuanced view of this unintended bias, by considering the various ways that a classifier's score distribution can vary across designated groups. We also introduce a large new test set of online comments with crowd-sourced annotations for identity references. We use this to show how our metrics can be used to find new and potentially subtle unintended bias in existing public models. View details
    Preview abstract We present a corpus that encompasses the complete history of conversations between contributors of English Wikipedia, one of the largest online collaborative communities. By recording the intermediate states of conversations---including not only comments and replies, but also their modifications, deletions and restorations---this data offers an unprecedented view of online conversation. This level of detail supports new research questions pertaining to the process (and challenges) of large-scale online collaboration. We illustrate the corpus' potential with two case studies that highlight new perspectives on earlier work. First, we explore how a person's conversational behavior depends on how they relate to the discussion venue. Second, we show that community moderation of toxic behavior happens at a higher rate than previously estimated. View details
    Conversations Gone Awry: Detecting Warning Signs of Conversational Failure
    Justine Zhang
    Jonathan P. Chang
    Cristian Danescu-Niculescu-Mizil
    Dario Taraborelli
    Proceedings of ACL, ACM Digital Library (2018)
    Preview abstract One of the main challenges online social systems face today is the prevalence of toxic behavior, such as harassment and personal attacks. This type of antisocial behavior is especially perplexing and disruptive when it emerges in the context of healthy conversations where, at least in principle, participants share a common goal and set of norms. In this work, we introduce the task of predicting whether a given conversation is on the verge of being derailed by the antisocial actions of one of its participants. As opposed to detecting toxic behavior after the fact, this task aims to enable early, actionable information at a time when the conversation might still be salvaged. We focus on two methodological challenges. First, through a combination of machine learning, crowd-sourcing and causal inference techniques applied to a novel dataset of 8 million conversations, we design a controlled setting that allows us to compare healthy conversations that deteriorate with similar conversations that stay on track, while accounting for confounding factors such as topical focus and number of participants. Second, we propose a framework for applying and evaluating linguistic, conversational and social patterns in the task of predicting the future trajectory of a conversation. Our primary result is that a simple model using conversational and linguistic features can achieve performance close to that of humans in predicting whether a civil conversation will go awry. We also show that the conversational context is more informative in this task than the history and experience of the participants. By demonstrating the feasibility of the prediction task, and by providing a labeled dataset, as well as a human baseline, we lay the ground for further work on methods for detecting early warning signs, and for eventually preventing, antisocial behavior in online discussions. View details
    Preview abstract We introduce and illustrate a new approach to measuring and mitigating unintended bias in machine learning models. Our definition of unintended bias is parameterized by a test set and a subset of input features. We illustrate how this can be used to evaluate text classifiers using a synthetic test set and a public corpus of comments annotated for toxicity from Wikipedia Talk pages. We also demonstrate how imbalances in training data can lead to unintended bias in the resulting models, and therefore potentially unfair applications. We use a set of common demographic identity terms as the subset of input features on which we measure bias. This technique permits analysis in the common scenario where demographic information on authors and readers is unavailable, so that bias mitigation must focus on the content of the text itself. The mitigation method we introduce is an unsupervised approach based on balancing the training dataset. We demonstrate that this approach reduces the unintended bias without compromising overall model quality View details
    Ex Machina: Personal attacks seen at scale
    Proceedings of the 26th International Conference on World Wide Web (2017), pp. 1391-1399
    Preview abstract The damage personal attacks cause to online discourse motivates many platforms to try to curb the phenomenon. However, understanding the prevalence and impact of personal attacks in online platforms at scale remains surprisingly difficult. The contribution of this paper is to develop and illustrate a method that combines crowdsourcing and machine learning to analyze personal attacks at scale. We show an evaluation method for a classifier in terms of the aggregated number of crowd-workers it can approximate. We apply our methodology to English Wikipedia, generating a corpus of over 100k high quality human-labeled comments and 63M machine-labeled ones from a classifier that is as good as the aggregate of 3 crowd-workers, as measured by the area under the ROC curve and Spearman correlation. Using this corpus of machine-labeled scores, our methodology allows us to explore some of the open questions about the nature of online personal attacks. This reveals that the majority of personal attacks on Wikipedia are not the result of a few malicious users, nor primarily the consequence of allowing anonymous contributions from unregistered users. View details
    No Results Found