Jonathan Berant
Research Areas
Authored Publications
Sort By
SEMQA: Semi-Extractive Multi-Source Question Answering
Haitian Sun
NAACL (2024) (to appear)
Preview abstract
Recently proposed long-form question answering (QA) systems, supported by large language models (LLMs), have shown promising capabilities. Yet, attributing and verifying their generated abstractive answers can be difficult, and automatically evaluating their accuracy remains an ongoing challenge.
In this paper, we introduce a new QA task for answering multi-answer questions by summarizing multiple diverse sources in a semi-extractive fashion. Specifically, Semi-extractive Multi-source QA (SEMQA) requires models to output a comprehensive answer while mixing between factual quoted spans---copied verbatim from given input sources---and non-factual free-text connectors that glue these spans together into a single cohesive passage. This setting bridges the gap between the outputs of well-grounded but constrained extractive QA systems and more fluent but harder to attribute fully abstractive answers. Particularly, it enables a new mode for language models that leverages their advanced language generation capabilities, while also producing fine in-line attributions by-design that are easy to verify, interpret, and evaluate. To study this task, we create the first dataset of this kind with human-written semi-extractive answers to natural and generated questions, and define text-based evaluation metrics. Experimenting with several LLMs in various settings, we find this task to be surprisingly challenging, demonstrating the importance of our work for developing and studying such consolidation capabilities.
View details
Learning Recurrent Span Representations for Extractive Question Answering
Shimi Salant
arXiv 1611.01436 (2017)
Preview abstract
The reading comprehension task, that asks questions about a given evidence document,
is a central problem in natural language understanding. Recent formulations
of this task have typically focused on answer selection from a set of candidates
pre-defined manually or through the use of an external NLP pipeline. However,
Rajpurkar et al. (2016) recently released the SQUAD dataset in which the answers
can be arbitrary strings from the supplied text. In this paper, we focus on
this answer extraction task, presenting a novel model architecture that efficiently
builds fixed length representations of all spans in the evidence document with a recurrent
network. We show that scoring explicit span representations significantly
improves performance over other approaches that factor the prediction into separate
predictions about words or start and end markers. Our approach improves
upon the best published results of Wang & Jiang (2016) by 5% and decreases the
error of Rajpurkar et al.’s baseline by > 50%.
View details