Zero-shot Hybrid Retrieval and Reranking Models for Biomedical Literature

Keith B. Hall
CLEF 2022: Conference and Labs of the Evaluation Forum


We describe our participating system in the document retrieval sub-task (Task B Phase A) at the 10th BioASQ challenge. We designed and implemented a zero-shot hybrid model using only synthetic train-ing data. The model consists of two stages: retrieval and reranking. The retrieval model is a hybrid of sparse and dense retrieval models, which is an extension of our participating system at 8th BioASQ challenge. We improved the dense retrieval model with a T5-based synthetic question generation model and an iterative training strategy involving techniques to filter low-quality synthetic data. In the second stage, we proposed a hybrid reranking model, which is trained using the candidates retrieved from the first stage. We further study if the knowledge from the hybrid reranking model can be transferred to the dense retrieval model through distillation. Our experiments show the proposed hybrid ranking model is effective with different first-stage retrieval models and applying reciprocal rank fusion on them brings additional boosts. Evaluation shows that our model compares favorably with other top participating systems, achieving MAP scores of 0.4696, 0.3984, 0.4586, 0.4089, 0.4065 and 0.1704 on six batches.