Characterizing Online Discussion Using Coarse Discourse Sequences

Amy Zhang; Bryan Culbertson; Praveen Paritosh

Characterizing Online Discussion Using Coarse Discourse Sequences

Amy Zhang

Bryan Culbertson

Praveen Paritosh

11th AAAI International Conference on Web and Social Media (ICWSM) (2017)

Download Google Scholar

Abstract

In this work, we present a novel method for classifying comments in online discussions into a set of coarse discourse acts towards the goal of better understanding discussions at scale. To facilitate this study, we devise a categorization of coarse discourse acts designed to encompass general online discussion and allow for easy annotation by crowd workers. We collect and release a corpus of over 9,000 threads comprising over 100,000 comments manually annotated via paid crowdsourcing with discourse acts and randomly sampled from the site Reddit. Using our corpus, we demonstrate how the analysis of discourse acts can characterize different types of discussions, including discourse sequences such as Q&A pairs and chains of disagreement, as well as different communities. Finally, we conduct experiments to predict discourse acts using our corpus, finding that structured prediction models such as conditional random fields can achieve an F1 score of 75%. We also demonstrate how the broadening of discourse acts from simply question and answer to a richer set of categories
can improve the recall performance of Q&A extraction.

Explore our many areas of focus

Building a collaborative ecosystem

Shaping the future together

Translating discovery into real-world impact

Characterizing Online Discussion Using Coarse Discourse Sequences

Abstract

Research Areas

Meet the teams driving innovation

Google AI

Google Cloud

Google DeepMind

Google Labs