Google Research

Assessing The Factual Accuracy of Text Generation

The 25th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD'19) (2019) (to appear)

Abstract

We propose an automatic metric to reflect the factual accuracy of generated text as an alternative to typical scoring schemes like ROUGE (Recall-Oriented Understudy for Gisting Evaluation) and BLEU (Bilingual Evaluation Understudy). We consider models that can extract fact triplets from text and then use them to de- fine a metric that compares triplets extracted from generated summaries and reference texts. We show that this metric correlates with human evaluation of factual accuracy better than ROUGE does. To build these models, we introduce a new Wikidata based dataset for fact extraction, and show that a transformer-based attention model can learn to predict structured fact triplets as well as perform favorably compared to more traditional two-stage approaches (entity recognition and relationship classification).

Learn more about how we do research

We maintain a portfolio of research projects, providing individuals and teams the freedom to emphasize specific types of work