AMMEBA: A Large-Scale Survey and Dataset of Media-Based Misinformation In-The-Wild

Nick Dufour
Arkanath Pathak
Pouya Samangouei
Nikki Hariri
Shashi Deshetti
Andrew Dudfield
Christopher Guess
Pablo Hernandez
Bobby Tran
Mevan Babakar
arXiv (2024)

Abstract

The prevalence and harms of online misinformation is a perennial concern for internet platforms, governments worldwide, and the press. Over time, information shared online has become more media-heavy and misinformation has readily adapted to these new modalities. The rise of generative AI-based tools, which provide widely-accessible methods for synthesizing realistic audio, images, video and human-like text, have made these concerns substantially more acute. Despite intense interest on the part of the public and significant press coverage, quantitative studies on the prevalence and modality of misinformation remains scarce. Here, we present the results of a two-year study using human raters to annotate online misinformation, focusing on misinformation that relies on the presence of media (i.e., images, audio clips and video), based on claims assessed in a large corpus of publicly accessible fact checks bearing the ClaimReivew markup. We visualize the rise of generative AI-based content in misinformation claims, and show that, despite its coverage of novel and sophisticated methods, simple context manipulations are the dominant form of media-based misinformation present online. The dataset collected will made publicly-available, and we hope that these data will serve as both a means of evaluating mitigation methods in a realistic setting and as a first-of-its-kind census of the types and modalities of online misinformation.
×