Medical Encoder: A Lightweight Dual-Encoder for Clinical Information Retrieval

Benny Li
Haiyang Yu
Chang Liu
Rupesh Kartha
Yuchen Liu
Fan Zhang
Ibtihel Amara
2024
Google Scholar

Abstract

Clinical information retrieval systems have played a crucial role in helping care-providersfind relevant medical information from numerous clinical documents and make accuratediagnosis and treatment decisions. This study proposes a lightweight medical retrieverdesigned to efficiently fetch relevant clinical text snippets online using medical conditionqueries. This retrieval system processes queries in under one millisecond per query, en-abling the prompt retrieval of medical information for timely decision-making. To optimizethe retrieval performance, we assess the impact of various modeling and data generationstrategies. Our results have shown that by leveraging Large Language Models and imple-menting proper fine tuning strategies, we achieve a well-performing dual encoder with highretrieval scores (73% and 50% of recalls for top-10 conditions and snippets retrieval respec-tively). We further investigated potential factors impacting the retrieval model’s behavior,and observed that both quality and coverage of the provided input query set is the primarydeterminant of the medical encoder’s generalizability.
×