WikiConv: A Corpus of the Complete Conversational History of a Large Online Collaborative Community
Abstract
We present a corpus that encompasses the complete history of conversations between contributors of English Wikipedia, one of the largest online collaborative communities.
By recording the intermediate states of conversations---including not only comments and replies, but also their modifications, deletions and restorations---this data offers an unprecedented view of online conversation.
This level of detail supports new research questions pertaining to the process (and challenges) of large-scale online collaboration.
We illustrate the corpus' potential with two case studies that highlight new perspectives on earlier work.
First, we explore how a person's conversational behavior depends on how they relate to the discussion venue.
Second, we show that community moderation of toxic behavior happens at a higher rate than previously estimated.
By recording the intermediate states of conversations---including not only comments and replies, but also their modifications, deletions and restorations---this data offers an unprecedented view of online conversation.
This level of detail supports new research questions pertaining to the process (and challenges) of large-scale online collaboration.
We illustrate the corpus' potential with two case studies that highlight new perspectives on earlier work.
First, we explore how a person's conversational behavior depends on how they relate to the discussion venue.
Second, we show that community moderation of toxic behavior happens at a higher rate than previously estimated.