Large Language Models Provide Human-Level Medical Text Snippet Labeling

Benny Li
Haiyang Yu
Chang Liu
Rupesh Kartha
Yuchen Liu
Fan Zhang
Ibtihel Amara
2024
Google Scholar

Abstract

This study evaluates the proficiency of Large Language Models (LLMs) in accurately labeling clinical document excerpts. Our focus is on the assignment of potential or confirmed diagnoses and medical procedures to snippets of medical text sourced from unstructured clinical patient records. We explore how the performance of LLMs compare against human annotators in classifying these excerpts. Employing a few-shot, chain-of-thought prompting approach with the MIMIC-III dataset, Med-PaLM 2 showcases annotation accuracy comparable to human annotators, achieving a notable precision rate of approximately 92\% relative to the gold standard labels established by human experts.
×