Lan Huong Nguyen
Authored Publications
Sort By
LLMs Accelerate Annotation for Medical Information Extraction
Akshay Goel
Almog Gueta
Omry Gilon
Chang Liu
Xiaohong Hao
Bolous Jaber
Shashir Reddy
Rupesh Kartha
Jean Steiner
Machine Learning for Health (ML4H), PMLR (2023)
Preview abstract
The unstructured nature of clinical notes within electronic health records often conceals vital patient-related information, making it challenging to access or interpret. To uncover this hidden information, specialized Natural Language Processing (NLP) models are required. However, training these models necessitates large amounts of labeled data, a process that is both time-consuming and costly when relying solely on human experts for annotation. In this paper, we propose an approach that combines Large Language Models (LLMs) with human expertise to create an efficient method for generating ground truth labels for medical text annotation. By utilizing LLMs in conjunction with human annotators, we significantly reduce the human annotation burden, enabling the rapid creation of labeled datasets. We rigorously evaluate our method on a medical information extraction task, demonstrating that our approach not only substantially cuts down on human intervention but also maintains high accuracy. The results highlight the potential of using LLMs to improve the utilization of unstructured clinical data, allowing for the swift deployment of tailored NLP solutions in healthcare.
View details
Building a Clinically-Focused Problem List From Medical Notes
Birju Patel
Cathy Cheung
Liwen Xu
Peter Clardy
Rachana Fellinger
LOUHI 2022: The 13th International Workshop on Health Text Mining and Information Analysis (2022)
Preview abstract
Clinical notes often contain vital information not observed in other structured data, but their unstructured nature can lead to critical patient-related information being lost. To make sure this valuable information is utilized for patient care, algorithms that summarize notes into a problem list are often proposed. Focusing on identifying medically-relevant entities in the free-form text, these solutions are often detached from a canonical ontology and do not allow downstream use of the detected text-spans. As a solution, we present here a system for generating a canonical problem list from medical notes, consisting of two major stages. At the first stage, annotation, we use a transformer model to detect all clinical conditions which are mentioned in a single note. These clinical conditions are then grounded to a predefined ontology, and are linked to spans in the text. At the second stage, summarization, we aggregate over the set of clinical conditions detected on all of the patient's note, and produce a concise patient summary that organizes their important conditions.
View details