Relevant Document Discovery for Fact-Checking Articles
Abstract
With the support of major search platforms such as Google and Bing, fact-checking articles, which can be identified by their adoption of the schema.org ClaimReview structured markup, have gained widespread recognition for their role in the fight against digital misinformation. A claim-relevant document is an online document that addresses, and potentially expresses a stance towards, some claim. The claim-relevance discovery problem, then, is to find claim-relevant documents. Depending on the verdict from the fact check, claim-relevance discovery can help identify online misinformation.
In this paper, we provide an initial approach to the claim-relevance discovery problem by leveraging various information retrieval and machine learning techniques. The system consists of three phases. First, we retrieve candidate documents based on various features in the fact-checking article. Second, we apply a relevance classifier to filter away documents that do not address the claim. Third, we apply a language feature based classifier to distinguish documents with different stances towards the claim. We experimentally demonstrate that our solution achieves solid results on a large-scale dataset and beats state-of-the-art baselines. Finally, we highlight a rich set of case studies to demonstrate the myriad of remaining challenges and that this problem is far from being solved.
In this paper, we provide an initial approach to the claim-relevance discovery problem by leveraging various information retrieval and machine learning techniques. The system consists of three phases. First, we retrieve candidate documents based on various features in the fact-checking article. Second, we apply a relevance classifier to filter away documents that do not address the claim. Third, we apply a language feature based classifier to distinguish documents with different stances towards the claim. We experimentally demonstrate that our solution achieves solid results on a large-scale dataset and beats state-of-the-art baselines. Finally, we highlight a rich set of case studies to demonstrate the myriad of remaining challenges and that this problem is far from being solved.