QANet: Combining Local Convolution with Global Self-Attention for Reading Comprehension

David Dohan
Rui Zhao
ICLR (2018)

Abstract

Current end-to-end Q&A models are primarily based on recurrent neural networks with attention. Despite their success, these models are often slow for both training and inference. We propose a novel Q&A model that does not require recurrent networks yet achieves equivalent or better performance than existing models. Our model is simple in that it consists exclusively of attention and convolutions. We present a thorough study of architectural choices that improve the accuracy of this simple model.
We also propose a novel data augmentation technique that not only enhances the training examples but also diversifies the phrasing of the sentences. It results in immediate improvement in the accuracy. This technique is of independent interest that it can be readily applied to other natural language processing tasks.
On the SQuAD dataset, our model is 3x faster in training and 10x faster in inference. The model achieves 82.2 F1 score on the development set, which is on par with best documented result of 81.8.