Jing Lu
Research Areas
Authored Publications
Sort By
RankT5: Fine-Tuning T5 for Text Ranking with Ranking Losses
Jianmo Ni
Proc. of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR) (2023)
Preview abstract
Pretrained language models such as BERT have been shown to be exceptionally effective for text ranking. However, there are limited studies on how to leverage more powerful sequence-to-sequence models such as T5. Existing attempts usually formulate text ranking as a classification problem and rely on postprocessing to obtain a ranked list. In this paper, we propose RankT5 and study two T5-based ranking model structures, an encoder-decoder and an encoder-only one, so that they not only can directly output ranking scores for each query-document pair, but also can be fine-tuned with "pairwise" or "listwise" ranking losses to optimize ranking performance. Our experiments show that the proposed models with ranking losses can achieve substantial ranking performance gains on different public text ranking data sets. Moreover, ranking models fine-tuned with listwise ranking losses have better zero-shot ranking performance on out-of-domain data than models fine-tuned with classification losses.
View details
Learning List-Level Domain-Invariant Representations for Ranking
Ruicheng Xian
Hamed Zamani
Han Zhao
37th Conference on Neural Information Processing Systems (NeurIPS 2023)
Preview abstract
Domain adaptation aims to transfer the knowledge learned on (data-rich) source domains to (low-resource) target domains, and a popular method is invariant representation learning, which matches and aligns the data distributions on the feature space. Although this method is studied extensively and applied on classification and regression problems, its adoption on ranking problems is sporadic, and the few existing implementations lack theoretical justifications. This paper revisits invariant representation learning for ranking. Upon reviewing prior work, we found that they implement what we call item-level alignment, which aligns the distributions of the items being ranked from all lists in aggregate but ignores their list structure. However, the list structure should be leveraged, because it is intrinsic to ranking problems where the data and the metrics are defined and computed on lists, not the items by themselves. To close this discrepancy, we propose list-level alignment—learning domain-invariant representations at the higher level of lists. The benefits are twofold: it leads to the first domain adaptation generalization bound for ranking, in turn providing theoretical support for the proposed method, and it achieves better empirical transfer performance for unsupervised domain adaptation on ranking tasks, including passage reranking.
View details
Out-of-Domain Semantics to the Rescue! Zero-Shot Hybrid Retrieval Models
The 44th European Conference on Information Retrieval (ECIR) (2022)
Preview abstract
The pre-trained language model (eg, BERT) based deep retrieval models achieved superior performance over lexical retrieval models (eg, BM25) in many passage retrieval tasks. However, limited work has been done to generalize a deep retrieval model to other tasks and domains. In this work, we carefully select five datasets, including two in-domain datasets and three out-of-domain datasets with different levels of domain shift, and study the generalization of a deep model in a zero-shot setting. Our findings show that the performance of a deep retrieval model is significantly deteriorated when the target domain is very different from the source domain that the model was trained on. On the contrary, lexical models are more robust across domains. We thus propose a simple yet effective framework to integrate lexical and deep retrieval models. Our experiments demonstrate that these two models are complementary, even when the deep model is weaker in the out-of-domain setting. The combined model obtains an average of 20.4% relative gain over the deep retrieval model, and an average of 9.54% over the lexical model in three out-of-domain datasets.
View details
Preview abstract
We describe our participating system in the document retrieval sub-task (Task B Phase A) at the 10th BioASQ challenge. We designed and implemented a zero-shot hybrid model using only synthetic train-ing data. The model consists of two stages: retrieval and reranking. The retrieval model is a hybrid of sparse and dense retrieval models, which is an extension of our participating system at 8th BioASQ challenge. We improved the dense retrieval model with a T5-based synthetic question generation model and an iterative training strategy involving techniques to filter low-quality synthetic data. In the second stage, we proposed a hybrid reranking model, which is trained using the candidates retrieved from the first stage. We further study if the knowledge from the hybrid reranking model can be transferred to the dense retrieval model through distillation. Our experiments show the proposed hybrid ranking model is effective with different first-stage retrieval models and applying reciprocal rank fusion on them brings additional boosts. Evaluation shows that our model compares favorably with other top participating systems, achieving MAP scores of 0.4696, 0.3984, 0.4586, 0.4089, 0.4065 and 0.1704 on six batches.
View details
ED2LM: Encoder-Decoder to Language Model for Faster Document Re-ranking Inference
Cicero Nogueira dos Santos
Yi Tay
ACL: Findings 2022 (2022)
Preview abstract
State-of-the-art neural models typically encode document-query pairs using cross-attention for re-ranking. To this end, models generally utilize an encoder-only (like BERT) paradigm or an encoder-decoder (like T5) approach. These paradigms, however, are not without flaws, i.e., running the model on all query-document pairs at inference-time incurs a significant computational cost. This paper proposes a new training and inference paradigm for re-ranking. We propose to finetune a pretrained encoder-decoder model using in the form of document to query generation. Subsequently, we show that this encoder-decoder architecture can be decomposed into a decoder-only language model during inference. This results in significant inference time speedups since the decoder-only architecture only needs to learn to interpret static encoder embeddings during inference. Our experiments show that this new paradigm achieves results that are comparable to the more expensive cross-attention ranking approaches while being up to 6.8X faster. We believe this work paves the way for more efficient neural rankers that leverage large pretrained models.
View details
Large Dual Encoders Are Generalizable Retrievers
Jianmo Ni
Zhuyun Dai
Vincent Zhao
Yi Luan
Keith B. Hall
Ming-Wei Chang
Yinfei Yang
(2022)
Preview abstract
It has been shown that dual encoders trained on one domain often fail to generalize to other domains for retrieval tasks. One widespread belief is that the bottleneck layer of a dual encoder, where the final score is simply a dot-product between a query vector and a passage vector, is too limited to make dual encoders an effective retrieval model for out-ofdomain generalization. In this paper, we challenge this belief by scaling up the size of the dual encoder model while keeping the bottleneck embedding size fixed. With multi-stage training, surprisingly, scaling up the model size brings significant improvement on a variety of retrieval tasks, especially for out-of-domain generalization. Experimental results show that our dual encoders, Generalizable T5-based dense Retrievers (GTR), outperform existing sparse and dense retrievers on the BEIR dataset (Thakur et al., 2021) significantly. Most surprisingly, our ablation study finds that GTR is very data efficient, as it only needs 10% of MS Marco supervised data to
achieve the best out-of-domain performance.
View details
Multi-stage Training with Improved Negative Contrast for Neural Passage Retrieval
Jianmo Ni
Yinfei Yang
EMNLP 2021, Association for Computational Linguistics (2021), pp. 6091-6103
Preview abstract
In this paper we explore the effects of negative sampling in dual encoder models used to retrieve passages in automatic question answering tasks. We explore four negative sampling strategies that complement the straightforward random sampling of negatives, typically used to train dual encoder models. Out of the four strategies, three are based on retrieval and one on heuristics. Of the three retrieval based strategies, two are based on the semantic similarity between the actual passage and its alternatives and another one is based on the lexical overlap between them. In our experiments we train the dual encoder models in two stages: pre-training with synthetic data and fine tuning with domain-specific data. Negative sampling is applied in both stages. Our negative sampling is particularly useful when we augment the generic data for pre-training with synthetic examples. We evaluate our approach in three passage retrieval tasks for open-domain question answering. Even though it is not evident that there is one single sampling strategy that works best in all three tasks, it is clear that they all contribute to improving the contrast between the actual retrieval and its alternatives. Furthermore, mixing the negatives from different strategies can achieve performance on par with the best performing strategy in all tasks. Our results establish a new state-of-the-art level of performance on two of the open-domain question answering tasks that we evaluated.
View details