Neural Ranking Models with Weak Supervision

Mostafa Dehghani; Hamed Zamani; Aliaksei Severyn; Jaap Kamps; W. Bruce Croft

Neural Ranking Models with Weak Supervision

Mostafa Dehghani

Hamed Zamani

Aliaksei Severyn

Jaap Kamps

W. Bruce Croft

Proceedings of The 40th International ACM SIGIR Conference on Research and Development in Information Retrieval, ACM (2017)

Download Google Scholar

Abstract

Despite the impressive improvements achieved by unsupervised
deep neural networks in computer vision, natural language processing,
and speech recognition tasks, such improvements have not
generally been observed in ranking for information retrieval. The
reason might be related to the complexity of the ranking problem,
in the sense that it is not obvious how to learn from queries and
documents when no supervised signal is available. Hence, in this
paper, we propose to train a neural ranking model from a weak
supervision signal, which is a training signal that can be obtained
automatically without human labeling or any external resources
(e.g., click data). To this aim, we use the output of a known unsupervised
ranking model, such as BM25, as a weak supervision
signal. We further train a set of simple yet effective ranking models
based on feed-forward neural networks. We study their effectiveness
under various learning scenarios (point-wise and pair-wise
models) and using different input representations (i.e., from encoding
query-document pairs into dense/sparse vectors to using word
embedding representation). We train our network on 5 million
unique queries obtained from the publicly available AOL query
logs and two standard collections: a homogeneous news collection
(Robust) and a heterogeneous large-scale web collection (ClueWeb).
Our experiments indicate that feeding raw data to the networks
and letting them learn representations for the input data leads to
an impressive performance, with over 13% and 35% MAP improvements
compared to the BM25 model on the Robust and the ClueWeb
collections, respectively. Our findings suggest that neural ranking
models can greatly benefit from large amounts of weakly labeled
data that can be easily obtained from unsupervised IR models.

Explore our many areas of focus

Building a collaborative ecosystem

Shaping the future together

Translating discovery into real-world impact

Neural Ranking Models with Weak Supervision

Abstract

Research Areas

Meet the teams driving innovation

Google AI

Google Cloud

Google DeepMind

Google Labs