Leveraging Semantic and Lexical Matching to Improve the Recall of Retrieval Systems: A Hybrid Approach

Cheng Li

Marc Najork

Mike Bendersky

Mingyang Zhang

Saar Kuzi

arXiv(2020)

Google Scholar

Abstract

Search engines often follow a 2-phase paradigm where in the first step an initial set of documents is retrieved (the \emp{retrieval} step) and in the second step the documents are ranked so as to obtain the final result list (the \emp{re-ranking} step). The focus of this paper is on improving the \emph{retrieval} step (measured mainly by recall) using deep neural network-based approaches. While deep neural networks were shown to improve the performance of the re-ranking step, there is little literature about using deep neural networks to improve the retrieval step. Previous works on deep neural networks for IR usually apply a simple lexical retrieval model for the retrieval step (e.g., BM25) and emphasize on the re-ranking step. In this paper, we propose and study a hybrid retrieval approach, which leverages both semantic (deep neural network based) and lexical (keyword matching based like BM25) matching techniques. The main idea is to perform semantic and lexical retrieval in parallel, and then to combine the result lists to generate the initial result set for re-ranking. An empirical evaluation, using a public TREC collection, shows that semantic retrieval model generated result lists often contain a substantial number of relevant documents not covered by the lexical-based generated lists. Further analysis of these relevant documents shows that they often also exhibit different characteristics than the lexical-based documents, attesting to the complementary nature of the two approaches. Finally, the experiments show that by combining the two result lists, the recall of the result list can increase significantly, the retrieval step can be greatly improved and these improvements are highly robust.

Research Areas

Information Retrieval and the Web

Defining the technology of today and tomorrow.

Philosophy

People

Teams

AI/ML Foundations  & Capabilities

Algorithms & Optimization

Computing Paradigms

Responsible Human-Centric Technology

Science & Societal Impact

Projects

Publications

Resources

Shaping the future, together.

Student programs

Faculty programs

Conferences & events

Leveraging Semantic and Lexical Matching to Improve the Recall of Retrieval Systems: A Hybrid Approach

Abstract

Research Areas

Meet the teams driving innovation

Defining the technology of today and tomorrow.

Philosophy

People

Teams

AI/ML Foundations & Capabilities

Algorithms & Optimization

Computing Paradigms

Responsible Human-Centric Technology

Science & Societal Impact

Projects

Publications

Resources

Shaping the future, together.

Student programs

Faculty programs

Conferences & events

Leveraging Semantic and Lexical Matching to Improve the Recall of Retrieval Systems: A Hybrid Approach

Abstract

Research Areas

Meet the teams driving innovation

AI/ML Foundations  & Capabilities