Multi-stage Training with Improved Negative Contrast for Neural Passage Retrieval

Gustavo Adolfo Hernandez Abrego

Ji Ma

Jianmo Ni

Jing Lu

Yinfei Yang

EMNLP 2021, Association for Computational Linguistics (2021), pp. 6091-6103

Download Google Scholar

Abstract

In this paper we explore the effects of negative sampling in dual encoder models used to retrieve passages in automatic question answering tasks. We explore four negative sampling strategies that complement the straightforward random sampling of negatives, typically used to train dual encoder models. Out of the four strategies, three are based on retrieval and one on heuristics. Of the three retrieval based strategies, two are based on the semantic similarity between the actual passage and its alternatives and another one is based on the lexical overlap between them. In our experiments we train the dual encoder models in two stages: pre-training with synthetic data and fine tuning with domain-specific data. Negative sampling is applied in both stages. Our negative sampling is particularly useful when we augment the generic data for pre-training with synthetic examples. We evaluate our approach in three passage retrieval tasks for open-domain question answering. Even though it is not evident that there is one single sampling strategy that works best in all three tasks, it is clear that they all contribute to improving the contrast between the actual retrieval and its alternatives. Furthermore, mixing the negatives from different strategies can achieve performance on par with the best performing strategy in all tasks. Our results establish a new state-of-the-art level of performance on two of the open-domain question answering tasks that we evaluated.

Defining the technology of today and tomorrow.

Philosophy

People

Research areas

Foundational ML & Algorithms

Computing Systems & Quantum AI

Science, AI & Society

Projects

Publications

Resources

Shaping the future, together.

Student programs

Faculty programs

Conferences & events

Multi-stage Training with Improved Negative Contrast for Neural Passage Retrieval

Abstract

Research Areas

Meet the teams driving innovation