Measuring Attribution in Natural Language Generation Models

Hannah Rashkin; Vitaly Nikolaev; Matthew Lamm; Lora Aroyo; Michael Collins; Dipanjan Das; Slav Petrov; Gaurav Singh Tomar; Iulia Turc; David Reitter

Measuring Attribution in Natural Language Generation Models

Hannah Rashkin

Vitaly Nikolaev

Matthew Lamm

Lora Aroyo

Michael Collins

Dipanjan Das

Slav Petrov

Gaurav Singh Tomar

Iulia Turc

David Reitter

Computational Linguistics, 49 (2023), pp. 777-840

Download Google Scholar

Abstract

With recent improvements in natural language generation (NLG) models for various applications, it has become imperative to have the means to identify and evaluate whether NLG output is only sharing verifiable information about the external world. In this work, we present a new evaluation framework entitled Attributable to Identified Sources (AIS) for assessing the output of natural language generation models, when such output pertains to the external world. We first define AIS and introduce a two-stage annotation pipeline for allowing annotators to appropriately evaluate model output according to AIS guidelines. We empirically validate this approach on generation datasets spanning three tasks (two conversational QA datasets, a summarization dataset, and a table-to-text dataset) via human evaluation studies that suggest that AIS could serve as a common framework for measuring whether model-generated statements are supported by underlying sources. We release guidelines for the human evaluation studies.

Research Areas

Natural language processing

Explore our many areas of focus

Building a collaborative ecosystem

Shaping the future together

Translating discovery into real-world impact

Measuring Attribution in Natural Language Generation Models

Abstract

Research Areas

Meet the teams driving innovation

Google AI

Google Cloud

Google DeepMind

Google Labs