Tao Tu
Research Areas
      Authored Publications
    
  
  
  
    
    
  
      
        Sort By
        
        
    
    
        
          
            
              Towards Conversational AI for Disease Management
            
          
        
        
          
            
              
                
                  
                    
                
              
            
              
                
                  
                    
                    
                  
              
            
              
                
                  
                    
                    
                  
              
            
              
                
                  
                    
                    
    
    
    
    
    
                      
                        Khaled Saab
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        David Stutz
                      
                    
                  
              
            
              
                
                  
                    
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Kavita Kulkarni
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Sara Mahdavi
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Joelle Barral
                      
                    
                  
              
            
              
                
                  
                    
                    
                  
              
            
              
                
                  
                    
                    
                  
              
            
              
                
                  
                    
                    
                  
              
            
              
                
                  
                    
                    
                      
                        James Manyika
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Ryutaro Tanno
                      
                    
                  
              
            
              
                
                  
                    
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Adam Rodman
                      
                    
                  
              
            
              
                
                  
                    
                    
                  
              
            
              
                
                  
                    
                    
                  
              
            
              
                
                  
                    
                    
                  
              
            
          
          
          
          
            arXiv (2025)
          
          
        
        
        
          
              Preview abstract
          
          
              While large language models (LLMs) have shown promise in diagnostic dialogue, their capabilities for effective management reasoning - including disease progression, therapeutic response, and safe medication prescription - remain under-explored. We advance the previously demonstrated diagnostic capabilities of the Articulate Medical Intelligence Explorer (AMIE) through a new LLM-based agentic system optimised for clinical management and dialogue, incorporating reasoning over the evolution of disease and multiple patient visit encounters, response to therapy, and professional competence in medication prescription. To ground its reasoning in authoritative clinical knowledge, AMIE leverages Gemini's long-context capabilities, combining in-context retrieval with structured reasoning to align its output with relevant and up-to-date clinical practice guidelines and drug formularies. In a randomized, blinded virtual Objective Structured Clinical Examination (OSCE) study, AMIE was compared to 21 primary care physicians (PCPs) across 100 multi-visit case scenarios designed to reflect UK NICE Guidance and BMJ Best Practice guidelines. AMIE was non-inferior to PCPs in management reasoning as assessed by specialist physicians and scored better in both preciseness of treatments and investigations, and in its alignment with and grounding of management plans in clinical guidelines. To benchmark medication reasoning, we developed RxQA, a multiple-choice question benchmark derived from two national drug formularies (US, UK) and validated by board-certified pharmacists. While AMIE and PCPs both benefited from the ability to access external drug information, AMIE outperformed PCPs on higher difficulty questions. While further research would be needed before real-world translation, AMIE's strong performance across evaluations marks a significant step towards conversational AI as a tool in disease management.
              
  
View details
          
        
      
    
        
          
            
              AI mirrors experimental science to uncover a novel mechanism of gene transfer crucial to bacterial evolution
            
          
        
        
          
            
              
                
                  
                    
    
    
    
    
    
                      
                        Juro Gottweis
                      
                    
                
              
            
              
                
                  
                    
                    
                  
              
            
              
                
                  
                    
                    
                  
              
            
              
                
                  
                    
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Jose R Penades
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Alexander Daryin
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Artiom Myaskovsky
                      
                    
                  
              
            
              
                
                  
                    
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Tiago R D Costa
                      
                    
                  
              
            
              
                
                  
                    
                    
                  
              
            
          
          
          
          
            Cell (2025)
          
          
        
        
        
          
              Preview abstract
          
          
              Note this is a re-submission of a previously approved ITP. The previous approval was conditional for a journal pre-sub enquiry only and we are submitting a new ITP for the preprint of the paper.
AI models have been proposed for hypothesis generation, but testing their ability to drive
high-impact research is challenging, since an AI-generated hypothesis can take decades to
validate. Here, we challenge the ability of a recently developed LLM-based platform to
generate high-level hypotheses by posing a question that took years to resolve
experimentally but remained unpublished: How could capsid-forming phage-inducible
chromosomal islands (cf-PICIs) spread across bacterial species? Remarkably, the AI’s top-
ranked hypothesis matched our experimentally confirmed mechanism: cf-PICIs hijack
diverse phage tails to expand their host range. We critically assess the AI’s five highest-
ranked hypotheses, showing that some opened new research avenues in our laboratories.
We benchmark its performance against other LLMs and outline best practices for integrating
AI into scientific discovery. Our findings suggest that AI can act not just as a computational
tool, but as a creative engine, accelerating discovery and reshaping how we generate and
test scientific hypotheses.
              
  
View details
          
        
      
    
        
          
            
              Towards Conversational Diagnostic AI
            
          
        
        
          
            
              
                
                  
                    
                
              
            
              
                
                  
                    
                    
                  
              
            
              
                
                  
                    
                    
                  
              
            
              
                
                  
                    
                    
    
    
    
    
    
                      
                        Khaled Saab
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Jan Freyberg
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Ryutaro Tanno
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Amy Wang
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Brenna Li
                      
                    
                  
              
            
              
                
                  
                    
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Nenad Tomašev
                      
                    
                  
              
            
              
                
                  
                    
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Karan Singhal
                      
                    
                  
              
            
              
                
                  
                    
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Le Hou
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Albert Webson
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Kavita Kulkarni
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Sara Mahdavi
                      
                    
                  
              
            
              
                
                  
                    
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Juro Gottweis
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Joelle Barral
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Kat Chou
                      
                    
                  
              
            
              
                
                  
                    
                    
                  
              
            
              
                
                  
                    
                    
                  
              
            
              
                
                  
                    
                    
                  
              
            
              
                
                  
                    
                    
                  
              
            
          
          
          
          
            Arxiv (2024) (to appear)
          
          
        
        
        
          
              Preview abstract
          
          
              At the heart of medicine lies the physician-patient dialogue, where skillful history-taking paves the way for accurate diagnosis, effective management, and enduring trust. Artificial Intelligence (AI) systems capable of diagnostic dialogue could increase accessibility, consistency, and quality of care. However, approximating clinicians' expertise is an outstanding grand challenge. Here, we introduce AMIE (Articulate Medical Intelligence Explorer), a Large Language Model (LLM) based AI system optimized for diagnostic dialogue.
AMIE uses a novel self-play based simulated environment with automated feedback mechanisms for scaling learning across diverse disease conditions, specialties, and contexts. We designed a framework for evaluating clinically-meaningful axes of performance including history-taking, diagnostic accuracy, management reasoning, communication skills, and empathy. We compared AMIE's performance to that of primary care physicians (PCPs) in a randomized, double-blind crossover study of text-based consultations with validated patient actors in the style of an Objective Structured Clinical Examination (OSCE). The study included 149 case scenarios from clinical providers in Canada, the UK, and India, 20 PCPs for comparison with AMIE, and evaluations by specialist physicians and patient actors. AMIE demonstrated greater diagnostic accuracy and superior performance on 28 of 32 axes according to specialist physicians and 24 of 26 axes according to patient actors. Our research has several limitations and should be interpreted with appropriate caution. Clinicians were limited to unfamiliar synchronous text-chat which permits large-scale LLM-patient interactions but is not representative of usual clinical practice. While further research is required before AMIE could be translated to real-world settings, the results represent a milestone towards conversational diagnostic AI.
              
  
View details
          
        
      
    
        
          
            
              Towards Generalist Biomedical AI
            
          
        
        
          
            
              
                
                  
                    
                
              
            
              
                
                  
                    
                    
                  
              
            
              
                
                  
                    
                    
    
    
    
    
    
                      
                        Danny Driess
                      
                    
                  
              
            
              
                
                  
                    
                    
                  
              
            
              
                
                  
                    
                    
                  
              
            
              
                
                  
                    
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Andrew Carroll
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Chuck Lau
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Ryutaro Tanno
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Ira Ktena
                      
                    
                  
              
            
              
                
                  
                    
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Basil Mustafa
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Aakanksha Chowdhery
                      
                    
                  
              
            
              
                
                  
                    
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Simon Kornblith
                      
                    
                  
              
            
              
                
                  
                    
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Philip Mansfield
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Sushant Prakash
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Renee Wong
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Sunny Virmani
                      
                    
                  
              
            
              
                
                  
                    
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Sara Mahdavi
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Bradley Green
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Ewa Dominowska
                      
                    
                  
              
            
              
                
                  
                    
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Joelle Barral
                      
                    
                  
              
            
              
                
                  
                    
                    
                  
              
            
              
                
                  
                    
                    
                  
              
            
              
                
                  
                    
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Karan Singhal
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Pete Florence
                      
                    
                  
              
            
              
                
                  
                    
                    
                  
              
            
              
                
                  
                    
                    
                  
              
            
          
          
          
          
            NEJM AI (2024)
          
          
        
        
        
          
              Preview abstract
          
          
              BACKGROUND: Medicine is inherently multimodal, requiring the simultaneous interpretation and integration of insights between many data modalities spanning text, imaging, genomics, and more. Generalist biomedical artificial intelligence systems that flexibly encode, integrate, and interpret these data might better enable impactful applications ranging from scientific discovery to care delivery.
METHODS: To catalyze development of these models, we curated MultiMedBench, a new multimodal biomedical benchmark. MultiMedBench encompasses 14 diverse tasks, such as medical question answering, mammography and dermatology image interpretation, radiology report generation and summarization, and genomic variant calling. We then introduced Med-PaLM Multimodal (Med-PaLM M), our proof of concept for a generalist biomedical AI system that flexibly encodes and interprets biomedical data including clinical language, imaging, and genomics with the same set of model weights. To further probe the capabilities and limitations of Med-PaLM M, we conducted a radiologist evaluation of model-generated (and human) chest x-ray reports.
RESULTS: We observed encouraging performance across model scales. Med-PaLM M reached performance competitive with or exceeding the state of the art on all MultiMedBench tasks, often surpassing specialist models by a wide margin. In a side-by-side ranking on 246 retrospective chest x-rays, clinicians expressed a pairwise preference for Med-PaLM Multimodal reports over those produced by radiologists in up to 40.50% of cases, suggesting potential clinical utility.
CONCLUSIONS: Although considerable work is needed to validate these models in real-world cases and understand if cross-modality generalization is possible, our results represent a milestone toward the development of generalist biomedical artificial intelligence systems. 
              
  
View details
          
        
      
    
        
          
            
              Large Language Models Encode Clinical Knowledge
            
          
        
        
          
            
              
                
                  
                    
    
    
    
    
    
                      
                        Karan Singhal
                      
                    
                
              
            
              
                
                  
                    
                    
                  
              
            
              
                
                  
                    
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Sara Mahdavi
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Jason Wei
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Hyung Won Chung
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Nathan Scales
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Ajay Tanwani
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Heather Cole-Lewis
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Perry Payne
                      
                    
                  
              
            
              
                
                  
                    
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Martin Seneviratne
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Paul Gamble
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Christopher Kelly
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Abubakr Abdelrazig Hassan Babiker
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Nathanael Schaerli
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Aakanksha Chowdhery
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Philip Mansfield
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Dina Demner-Fushman
                      
                    
                  
              
            
              
                
                  
                    
                    
                  
              
            
              
                
                  
                    
                    
                  
              
            
              
                
                  
                    
                    
                  
              
            
              
                
                  
                    
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Katherine Chou
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Juraj Gottweis
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Nenad Tomašev
                      
                    
                  
              
            
              
                
                  
                    
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Alvin Rajkomar
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Joelle Barral
                      
                    
                  
              
            
              
                
                  
                    
                    
                  
              
            
              
                
                  
                    
                    
                  
              
            
              
                
                  
                    
                    
                  
              
            
          
          
          
          
            Nature (2023)
          
          
        
        
        
          
              Preview abstract
          
          
              Large language models (LLMs) have demonstrated impressive capabilities, but the bar for clinical applications is high. Attempts to assess the clinical knowledge of models typically rely on automated evaluations based on limited benchmarks. Here, to address these limitations, we present MultiMedQA, a benchmark combining six existing medical question answering datasets spanning professional medicine, research and consumer queries and a new dataset of medical questions searched online, HealthSearchQA. We propose a human evaluation framework for model answers along multiple axes including factuality, comprehension, reasoning, possible harm and bias. In addition, we evaluate Pathways Language Model (PaLM, a 540-billion parameter LLM) and its instruction-tuned variant, Flan-PaLM on MultiMedQA. Using a combination of prompting strategies, Flan-PaLM achieves state-of-the-art accuracy on every MultiMedQA multiple-choice dataset (MedQA, MedMCQA, PubMedQA and Measuring Massive Multitask Language Understanding (MMLU) clinical topics), including 67.6% accuracy on MedQA (US Medical Licensing Exam-style questions), surpassing the prior state of the art by more than 17%. However, human evaluation reveals key gaps. To resolve this, we introduce instruction prompt tuning, a parameter-efficient approach for aligning LLMs to new domains using a few exemplars. The resulting model, Med-PaLM, performs encouragingly, but remains inferior to clinicians. We show that comprehension, knowledge recall and reasoning improve with model scale and instruction prompt tuning, suggesting the potential utility of LLMs in medicine. Our human evaluations reveal limitations of today’s models, reinforcing the importance of both evaluation frameworks and method development in creating safe, helpful LLMs for clinical applications.
              
  
View details
          
        
      
    
        
          
            
              Consensus, dissensus and synergy between clinicians and specialist foundation models in radiology report generation
            
          
        
        
          
            
              
                
                  
                    
    
    
    
    
    
                      
                        Ryutaro Tanno
                      
                    
                
              
            
              
                
                  
                    
                    
                      
                        David Barrett
                      
                    
                  
              
            
              
                
                  
                    
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Sumedh Ghaisas
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Sumanth Dathathri
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Abi See
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Johannes Welbl
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Karan Singhal
                      
                    
                  
              
            
              
                
                  
                    
                    
                  
              
            
              
                
                  
                    
                    
                  
              
            
              
                
                  
                    
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Rhys May
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Roy Lee
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        SiWai Man
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Zahra Ahmed
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Sara Mahdavi
                      
                    
                  
              
            
              
                
                  
                    
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Joelle Barral
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Ali Eslami
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Danielle Belgrave
                      
                    
                  
              
            
              
                
                  
                    
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Shravya Shetty
                      
                    
                  
              
            
              
                
                  
                    
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Po-Sen Huang
                      
                    
                  
              
            
              
                
                  
                    
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Ira Ktena
                      
                    
                  
              
            
          
          
          
          
            Arxiv (2023)
          
          
        
        
        
          
              Preview abstract
          
          
              Radiology reports are an instrumental part of modern medicine, informing key clinical decisions such as diagnosis and treatment. The worldwide shortage of radiologists, however, restricts access to expert care and imposes heavy workloads, contributing to avoidable errors and delays in report delivery. While recent progress in automated report generation with vision-language models offer clear potential in ameliorating the situation, the path to real-world adoption has been stymied by the challenge of evaluating the clinical quality of AI-generated reports. In this study, we build a state-of-the-art report generation system for chest radiographs, Flamingo-CXR, by fine-tuning a well-known vision-language foundation model on radiology data. To evaluate the quality of the AI-generated reports, a group of 16 certified radiologists provide detailed evaluations of AI-generated and human written reports for chest X-rays from an intensive care setting in the United States and an inpatient setting in India. At least one radiologist (out of two per case) preferred the AI report to the ground truth report in over 60% of cases for both datasets. Amongst the subset of AI-generated reports that contain errors, the most frequently cited reasons were related to the location and finding, whereas for human written reports, most mistakes were related to severity and finding. This disparity suggested potential complementarity between our AI system and human experts, prompting us to develop an assistive scenario in which Flamingo-CXR generates a first-draft report, which is subsequently revised by a clinician. This is the first demonstration of clinician-AI collaboration for report writing, and the resultant reports are assessed to be equivalent or preferred by at least one radiologist to reports written by experts alone in 80% of in-patient cases and 60% of intensive care cases.
              
  
View details
          
        
      
    
        
          
            
              Towards Accurate Differential Diagnosis with Large Language Models
            
          
        
        
          
            
              
                
                  
                    
    
    
    
    
    
                      
                        Daniel McDuff
                      
                    
                
              
            
              
                
                  
                    
                    
                  
              
            
              
                
                  
                    
                    
                  
              
            
              
                
                  
                    
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Amy Wang
                      
                    
                  
              
            
              
                
                  
                    
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Karan Singhal
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Yash Sharma
                      
                    
                  
              
            
              
                
                  
                    
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Kavita Kulkarni
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Le Hou
                      
                    
                  
              
            
              
                
                  
                    
                    
                  
              
            
              
                
                  
                    
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Sara Mahdavi
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Sushant Prakash
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Anupam Pathak
                      
                    
                  
              
            
              
                
                  
                    
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Shwetak Patel
                      
                    
                  
              
            
              
                
                  
                    
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Ewa Dominowska
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Juro Gottweis
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Joelle Barral
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Kat Chou
                      
                    
                  
              
            
              
                
                  
                    
                    
                  
              
            
              
                
                  
                    
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Jake Sunshine
                      
                    
                  
              
            
              
                
                  
                    
                    
                  
              
            
              
                
                  
                    
                    
                  
              
            
          
          
          
          
            Arxiv (2023)
          
          
        
        
        
          
              Preview abstract
          
          
              An accurate differential diagnosis (DDx) is a cornerstone of medical care, often reached through an iterative process of interpretation that combines clinical history, physical examination, investigations and procedures. Interactive interfaces powered by Large Language Models (LLMs) present new opportunities to both assist and automate aspects of this process. In this study, we introduce an LLM optimized for diagnostic reasoning, and evaluate its ability to generate a DDx alone or as an aid to clinicians. 20 clinicians evaluated 302 challenging, real-world medical cases sourced from the New England Journal of Medicine (NEJM) case reports. Each case report was read by two clinicians, who were randomized to one of two assistive conditions: either assistance from search engines and standard medical resources, or LLM assistance in addition to these tools. All clinicians provided a baseline, unassisted DDx prior to using the respective assistive tools. Our LLM for DDx exhibited standalone performance that exceeded that of unassisted clinicians (top-10 accuracy 59.1% vs 33.6%, [p = 0.04]). Comparing the two assisted study arms, the DDx quality score was higher for clinicians assisted by our LLM (top-10 accuracy 51.7%) compared to clinicians without its assistance (36.1%) (McNemar's Test: 45.7, p < 0.01) and clinicians with search (44.4%) (4.75, p = 0.03). Further, clinicians assisted by our LLM arrived at more comprehensive differential lists than those without its assistance. Our study suggests that our LLM for DDx has potential to improve clinicians' diagnostic reasoning and accuracy in challenging cases, meriting further real-world evaluation for its ability to empower physicians and widen patients' access to specialist-level expertise.
              
  
View details
          
        
      
    
        
          
            
              Towards Physician-Level Medical Question Answering with Large Language Models
            
          
        
        
          
            
              
                
                  
                    
    
    
    
    
    
                      
                        Karan Singhal
                      
                    
                
              
            
              
                
                  
                    
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Juro Gottweis
                      
                    
                  
              
            
              
                
                  
                    
                    
                  
              
            
              
                
                  
                    
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Le Hou
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Kevin Clark
                      
                    
                  
              
            
              
                
                  
                    
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Heather Cole-Lewis
                      
                    
                  
              
            
              
                
                  
                    
                    
                  
              
            
              
                
                  
                    
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Amy Wang
                      
                    
                  
              
            
              
                
                  
                    
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Sami Lachgar
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Philip Mansfield
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Sushant Prakash
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Bradley Green
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Ewa Dominowska
                      
                    
                  
              
            
              
                
                  
                    
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Nenad Tomašev
                      
                    
                  
              
            
              
                
                  
                    
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Renee Wong
                      
                    
                  
              
            
              
                
                  
                    
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Sara Mahdavi
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Joelle Barral
                      
                    
                  
              
            
              
                
                  
                    
                    
                  
              
            
              
                
                  
                    
                    
                  
              
            
              
                
                  
                    
                    
                  
              
            
              
                
                  
                    
                    
                  
              
            
              
                
                  
                    
                    
                  
              
            
              
                
                  
                    
                    
                  
              
            
          
          
          
          
            Arxiv (2023) (to appear)
          
          
        
        
        
          
              Preview abstract
          
          
              Recent artificial intelligence (AI) systems have reached milestones in "grand challenges" ranging from Go to protein-folding. The capability to retrieve medical knowledge, reason over it, and answer medical questions comparably to physicians has long been viewed as one such grand challenge.
Large language models (LLMs) have catalyzed significant progress in medical question answering; Med-PaLM was the first model to exceed a "passing" score in US Medical Licensing Examination (USMLE) style questions with a score of 67.2% on the MedQA dataset. However, this and other prior work suggested significant room for improvement, especially when models' answers were compared to clinicians' answers. Here we present Med-PaLM 2, which bridges these gaps by leveraging a combination of base LLM improvements (PaLM 2), medical domain finetuning, and prompting strategies including a novel ensemble refinement approach.
Med-PaLM 2 scored up to 86.5% on the MedQA dataset, improving upon Med-PaLM by over 19% and setting a new state-of-the-art. We also observed performance approaching or exceeding state-of-the-art across MedMCQA, PubMedQA, and MMLU clinical topics datasets.
We performed detailed human evaluations on long-form questions along multiple axes relevant to clinical applications. In pairwise comparative ranking of 1066 consumer medical questions, physicians preferred Med-PaLM 2 answers to those produced by physicians on eight of nine axes pertaining to clinical utility (p < 0.001). We also observed significant improvements compared to Med-PaLM on every evaluation axis (p < 0.001) on newly introduced datasets of 240 long-form "adversarial" questions to probe LLM limitations.
While further studies are necessary to validate the efficacy of these models in real-world settings, these results highlight rapid progress towards physician-level performance in medical question answering.
              
  
View details
          
        
      
    
        
          
            
              Automated LOINC Standardization Using Pre-trained Large Language Models
            
          
        
        
          
            
              
                
                  
                    
                
              
            
              
                
                  
                    
                    
    
    
    
    
    
                      
                        Eric Loreaux
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Emma Chesley
                      
                    
                  
              
            
              
                
                  
                    
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Paul Gamble
                      
                    
                  
              
            
              
                
                  
                    
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Martin Seneviratne
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Ming-Jun Chen
                      
                    
                  
              
            
          
          
          
          
            PMLR (2022), pp. 343-355
          
          
        
        
        
          
              Preview abstract
          
          
              Harmonization of local source concepts to standard clinical terminologies is a prerequisite for multi-center data aggregation and sharing. Challenges in automating the mapping process stem from the idiosyncratic source encoding schemes adopted by different health systems and the lack of large publicly available training data. In this study, we aim to develop a scalable and generalizable machine learning tool to facilitate standardizing laboratory observations to the Logical Observation Identifiers Names and Codes (LOINC). Specifically, we leverage the contextual embedding from pre-trained T5 models and propose a two-stage fine-tuning strategy based on contrastive learning to enable learning in a few-shot setting without manual feature engineering. Our method utilizes unlabeled general LOINC ontology and data augmentation to achieve impressive performance on retrieving the most relevant LOINC targets when limited amount of labeled data are available. We further show that our model generalizes well to unseen targets. Taken together, our approach shows great potential to reduce manual effort in LOINC standardization and can be easily extended to mapping other terminologies.
              
  
View details