SocialQuotes: Learning Contextual Roles of Social Media Quotes on the Web

Abstract

A vast amount of human discussion, storytelling, content creation,
and reporting now occurs on social media platforms. As such, social
media posts are often quoted on web pages as context. In this
paper, we argue that these quotations and their surrounding page
context provide a rich, platform-independent source of data for
studying the intersection of natural language and social media.
We introduce a taxonomy of quotation roles that categorizes how
social media posts are used within content. We release a dataset
of 38M social quotes derived from the Common Crawl, and role
labels for a subset assessed by human raters. We show that the
interplay of accounts, roles, and topics across the web graph reveal
valuable social diffusion patterns, and that roles can be predicted
with fine-tuned large language models from web context.