Towards Attributed Large Language Models: Attributed Question Answering as a Case Study in Modeling and Evaluation


Large language models (LLMs) have shown impressive results across a variety of natural language understanding and generation tasks while requiring little or no direct supervision. However, we believe the ability of an LLM to attribute the text that it generates is likely to be crucial for both system developers and users in information-seeking scenarios. In this paper we propose Attributed QA as a key first step in the development of attributed LLMs. We propose an evaluation framework for Attributed QA, using human annotations based on the AIS formulation described in Rashkin et al., 2021 and a correlated automatic metric suitable for development settings. We evaluate a broad set of state-of-the-art systems on the task, finding strong performance in directly supervised approaches and outstanding challenges for making use of LLM-generated answers.

