- Justine Zhang
- Jonathan P. Chang
- Cristian Danescu-Niculescu-Mizil
- Lucas Dixon
- Dario Taraborelli
- Nithum Thain
- Dario Taraborelli
One of the main challenges online social systems face today is the prevalence of toxic behavior, such as harassment and personal attacks. This type of antisocial behavior is especially perplexing and disruptive when it emerges in the context of healthy conversations where, at least in principle, participants share a common goal and set of norms. In this work, we introduce the task of predicting whether a given conversation is on the verge of being derailed by the antisocial actions of one of its participants. As opposed to detecting toxic behavior after the fact, this task aims to enable early, actionable information at a time when the conversation might still be salvaged.
We focus on two methodological challenges. First, through a combination of machine learning, crowd-sourcing and causal inference techniques applied to a novel dataset of 8 million conversations, we design a controlled setting that allows us to compare healthy conversations that deteriorate with similar conversations that stay on track, while accounting for confounding factors such as topical focus and number of participants. Second, we propose a framework for applying and evaluating linguistic, conversational and social patterns in the task of predicting the future trajectory of a conversation.
Our primary result is that a simple model using conversational and linguistic features can achieve performance close to that of humans in predicting whether a civil conversation will go awry. We also show that the conversational context is more informative in this task than the history and experience of the participants. By demonstrating the feasibility of the prediction task, and by providing a labeled dataset, as well as a human baseline, we lay the ground for further work on methods for detecting early warning signs, and for eventually preventing, antisocial behavior in online discussions.