Building Large Machine Reading-Comprehension Datasets using Paragraph Vectors

Radu Soricut; Nan Ding

Building Large Machine Reading-Comprehension Datasets using Paragraph Vectors

Radu Soricut

Nan Ding

Arxiv, https://arxiv.org/abs/1612.04342 (2016)

Google Scholar

Abstract

We present a dual contribution to the task of machine reading-comprehension: a technique for creating large-sized machine-comprehension (MC) datasets using paragraph-vector models; and a novel, hybrid neural-network architecture that combines the representation power of recursive neural networks with the discriminative power of fully-connected multi-layered networks. We use the MC-dataset generation technique to build a dataset of around 2 million examples, for which we empirically determine the high-ceiling of human performance (around 91\% accuracy), as well as the performance of a variety of computer models. Among all the models we have experimented with, our hybrid neural-network architecture achieves the highest performance (83.2\% accuracy). The remaining gap to the human-performance ceiling provides enough room for future model improvements.

Explore our many areas of focus

Building a collaborative ecosystem

Shaping the future together

Translating discovery into real-world impact

Building Large Machine Reading-Comprehension Datasets using Paragraph Vectors

Abstract

Research Areas

Meet the teams driving innovation

Google AI

Google Cloud

Google DeepMind

Google Labs