Abstract
Generative AI capabilities have enabled malicious actors to flood online platforms with "AI slop"—mass-produced, low-quality synthetic media designed to overwhelm traditional integrity systems. These adversarial campaigns often utilize coordinated networks to distribute unique, localized variations of synthetic content, rendering static detection methods ineffective. The signals to detect coordination often have recall gaps. The content is not exactly duplicative to be in the same repetitive video cluster. The abusers however show similar patterns of behavior which need forensics. Manual forensic investigations cannot scale to match the velocity of these generative attacks.
To address this, we present SAFE (Scaled Abuse Forensics Examiner), an automated multi-agent architecture designed for the scalable forensics of adversarial synthetic media. The system decomposes the investigation process into specialized agents: a Cluster Understanding Agent specialized in analyzing the relations between channels in a cluster, a Behavior Understanding Agent that identifies inorganic spatiotemporal patterns, and a Content Understanding Agent that utilizes LoRA-adapted Large Language Models (LLMs) and few-shot learning to detect existing policy violations and spirit of the policy violations respectively . A Root Agent synthesizes these multimodal signals to render a final verdict. Early deployment results indicate that SAFE significantly accelerates the identification of novel synthetic threats, reducing forensic investigation time compared to human-in-the-loop workflows.