WikiConv: A Corpus of the Complete Conversational History of a Large Online Collaborative Community

Cristian Danescu
Dario Taraborelli
Yiqing Hua
ACL(2018), pp. 5
Google Scholar


We present a corpus that encompasses the complete history of conversations between contributors of English Wikipedia, one of the largest online collaborative communities. By recording the intermediate states of conversations---including not only comments and replies, but also their modifications, deletions and restorations---this data offers an unprecedented view of online conversation. This level of detail supports new research questions pertaining to the process (and challenges) of large-scale online collaboration. We illustrate the corpus' potential with two case studies that highlight new perspectives on earlier work. First, we explore how a person's conversational behavior depends on how they relate to the discussion venue. Second, we show that community moderation of toxic behavior happens at a higher rate than previously estimated.