Google Research


ICLR (2020) (to appear)


We wish to put forward an approach for accessing text as a knowledge base which is useful for question-answering (QA). This approach relies centrally on development of a differentiable operator which allows us to traverse textual data like a ``virtual'' KB. The core of the approach is a neural module that inputs and outputs sets of entities: in particular, this module uses maximum inner product search (MIPS) on a special index to map a set of entities $X$ to all entities $Y$ related to something in $X$ (by some specified relations), as witnessed by some text in the corpus. For multi-hop questions, the set of output entities $Y$ can be again used recursively as the input to a second copy of the module, enabling us to answer complex questions. This module is differentiable, so the full system can be trained completely end-to-end using gradient based methods. Thus, we name it DrKIT: Differentiable Reasoning over a virtual Knowledge base of Indexed Text. We describe a pretraining scheme for the index mention encoder by generating hard negative examples using existing knowledge bases, and we show that DrKIT improves accuracy by $9$ points on 3-hop questions in the MetaQA dataset, cutting the gap between text-based and KB-based methods by $70\%$. DrKIT is also very efficient, processing 10x more queries per second than existing state-of-the-art multi-hop QA systems.

Research Areas

Learn more about how we do research

We maintain a portfolio of research projects, providing individuals and teams the freedom to emphasize specific types of work