End-to-End Query Term Weighting

Karan Samel; Cheng Li; Weize Kong; Tao Chen; Mingyang Zhang; Shaleen Gupta; Swaraj Khadanga; Wensong Xu; Xingyu Wang; Kashyap Kolipaka; Mike Bendersky; Marc Najork

End-to-End Query Term Weighting

Karan Samel

Cheng Li

Weize Kong

Tao Chen

Mingyang Zhang

Shaleen Gupta

Swaraj Khadanga

Wensong Xu

Xingyu Wang

Kashyap Kolipaka

Mike Bendersky

Marc Najork

Proceedings of the 29th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD '23) (2023)

Download Google Scholar

Abstract

Bag-of-words based lexical retrieval systems are still the most commonly used methods for real-world search applications. Recently deep learning methods have shown promising results to improve this retrieval performance but are expensive to run in an online fashion, non-trivial to integrate into existing production systems, and might not generalize well in out-of-domain retrieval scenarios. Instead, we build on top of lexical retrievers by proposing a Term Weighting BERT (TW-BERT) model. TW-BERT learns to predict the weight for individual n-gram (e.g., uni-grams and bi-grams) query input terms. These inferred weights and terms can be used directly by a retrieval system to perform a query search. To optimize these term weights, TW-BERT incorporates the scoring function used by the search engine, such as BM25, to score query-document pairs. Given sample query-document pairs we can compute a ranking loss over these matching scores, optimizing the learned query term weights in an end-to-end fashion. Aligning TW-BERT with search engine scorers minimizes the changes needed to integrate it into existing production applications, whereas existing deep learning based search methods would require further infrastructure optimization and hardware requirements. The learned weights can be easily utilized by standard lexical retrievers and by other retrieval techniques such as query expansion. We show that TW-BERT improves retrieval performance over strong term weighting baselines within MSMARCO and in out-of-domain retrieval on TREC datasets.

Explore our many areas of focus

Building a collaborative ecosystem

Shaping the future together

Translating discovery into real-world impact

End-to-End Query Term Weighting

Abstract

Research Areas

Meet the teams driving innovation

Google AI

Google Cloud

Google DeepMind

Google Labs