- Tom Kwiatkowski
- Jennimaria Palomaki
- Olivia Redfield
- Michael Collins
- Ankur Parikh
- Chris Alberti
- Danielle Epstein
- Illia Polosukhin
- Matthew Kelcey
- Jacob Devlin
- Kenton Lee
- Kristina N. Toutanova
- Llion Jones
- Ming-Wei Chang
- Andrew Dai
- Jakob Uszkoreit
- Quoc Le
- Slav Petrov
Abstract
We present the Natural Questions corpus, a question answering dataset. Questions consist of real anonymized, aggregated queries issued to the Google search engine. An annotator is presented with a question along with a Wikipedia page from the top 5 search results, and annotates a long answer (typically a paragraph) and a short answer (one or more entities) if present on the page, or marks null if no long/short answer is present. The public release consists of 307,373 training examples with single annotations, 7,830 examples with 5-way annotations for development data, and a further 7,842 examples 5-way annotated sequestered as test data. We present experiments validating quality of the data. We also describe analysis of 25-way annotations on 302 examples, giving insights into human variability on the annotation task. We introduce robust metrics for the purposes of evaluating question answering systems; demonstrate high human upper bounds on these metrics; and establish baseline results using competitive methods drawn from related literature.
Research Areas
Learn more about how we do research
We maintain a portfolio of research projects, providing individuals and teams the freedom to emphasize specific types of work