Assessing The Factual Accuracy of Text Generation

Ben Goodrich

Mohammad Ahmad Saleh

Peter Liu

Vinay Rao

The 25th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD'19) (2019) (to appear)

Google Scholar

Abstract

We propose an automatic metric to reflect the factual accuracy of generated text as an alternative to typical scoring schemes like ROUGE (Recall-Oriented Understudy for Gisting Evaluation) and BLEU (Bilingual Evaluation Understudy). We consider models that can extract fact triplets from text and then use them to de- fine a metric that compares triplets extracted from generated summaries and reference texts. We show that this metric correlates with human evaluation of factual accuracy better than ROUGE does. To build these models, we introduce a new Wikidata based dataset for fact extraction, and show that a transformer-based attention model can learn to predict structured fact triplets as well as perform favorably compared to more traditional two-stage approaches (entity recognition and relationship classification).

Defining the technology of today and tomorrow.

Philosophy

People

Teams

AI/ML Foundations  & Capabilities

Algorithms & Optimization

Computing Paradigms

Responsible Human-Centric Technology

Science & Societal Impact

Projects

Publications

Resources

Shaping the future, together.

Student programs

Faculty programs

Conferences & events

Assessing The Factual Accuracy of Text Generation

Abstract

Research Areas

Meet the teams driving innovation

Defining the technology of today and tomorrow.

Philosophy

People

Teams

AI/ML Foundations & Capabilities

Algorithms & Optimization

Computing Paradigms

Responsible Human-Centric Technology

Science & Societal Impact

Projects

Publications

Resources

Shaping the future, together.

Student programs

Faculty programs

Conferences & events

Assessing The Factual Accuracy of Text Generation

Abstract

Research Areas

Meet the teams driving innovation

AI/ML Foundations  & Capabilities