Jump to Content


Bhuwan Dhingra
Graham Neubig
Ruslan Salakhutdinov
Vidhisha Balachandran
William Weston Cohen
ICLR (2020) (to appear)
Google Scholar


We wish to put forward an approach for accessing text as a knowledge base which is useful for question-answering (QA). This approach relies centrally on development of a differentiable operator which allows us to traverse textual data like a ``virtual'' KB. The core of the approach is a neural module that inputs and outputs sets of entities: in particular, this module uses maximum inner product search (MIPS) on a special index to map a set of entities $X$ to all entities $Y$ related to something in $X$ (by some specified relations), as witnessed by some text in the corpus. For multi-hop questions, the set of output entities $Y$ can be again used recursively as the input to a second copy of the module, enabling us to answer complex questions. This module is differentiable, so the full system can be trained completely end-to-end using gradient based methods. Thus, we name it DrKIT: Differentiable Reasoning over a virtual Knowledge base of Indexed Text. We describe a pretraining scheme for the index mention encoder by generating hard negative examples using existing knowledge bases, and we show that DrKIT improves accuracy by $9$ points on 3-hop questions in the MetaQA dataset, cutting the gap between text-based and KB-based methods by $70\%$. DrKIT is also very efficient, processing 10x more queries per second than existing state-of-the-art multi-hop QA systems.

Research Areas