 
                Vivek Natarajan
            Vivek Natarajan is a Research Scientist at Google DeepMind leading research at the intersection of AI, science and medicine. 
In particular, Vivek was the lead researcher behind Med-PaLM and Med-PaLM 2, which were the first AI systems to obtain passing and expert level scores on US Medical License exam questions respectively. Med-PaLM was published in Nature in 2023 and has been featured in The Scientific American, Wall Street Journal, The Economist, STAT News, CNBC, Forbes, New Scientist among others.
Vivek co-leads Project AMIE with Dr Alan Karthikesalingam at Google, a research effort aiming to build and democratize conversational, multimodal, diagnostic and empathetic medical super intelligence. Two papers from Project AMIE were recently published in Nature with the AI system surpassing primary care physicians in performing medical consultations in simulated settings. 
Finally, Vivek recently co-led the development of the AI co-scientist, a system designed to be a virtual AI collaborator for scientists. 
Outside of Google, Vivek is also part of the faculty for executive education at Harvard T.H. Chan School of Public Health in a part-time capacity.
          
        
        
      Authored Publications
    
  
  
  
    
    
  
      
        Sort By
        
        
    
    
        
          
            
              Towards Conversational AI for Disease Management
            
          
        
        
          
            
              
                
                  
                    
                
              
            
              
                
                  
                    
                    
                  
              
            
              
                
                  
                    
                    
                  
              
            
              
                
                  
                    
                    
    
    
    
    
    
                      
                        Khaled Saab
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        David Stutz
                      
                    
                  
              
            
              
                
                  
                    
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Kavita Kulkarni
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Sara Mahdavi
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Joelle Barral
                      
                    
                  
              
            
              
                
                  
                    
                    
                  
              
            
              
                
                  
                    
                    
                  
              
            
              
                
                  
                    
                    
                  
              
            
              
                
                  
                    
                    
                      
                        James Manyika
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Ryutaro Tanno
                      
                    
                  
              
            
              
                
                  
                    
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Adam Rodman
                      
                    
                  
              
            
              
                
                  
                    
                    
                  
              
            
              
                
                  
                    
                    
                  
              
            
              
                
                  
                    
                    
                  
              
            
          
          
          
          
            arXiv (2025)
          
          
        
        
        
          
              Preview abstract
          
          
              While large language models (LLMs) have shown promise in diagnostic dialogue, their capabilities for effective management reasoning - including disease progression, therapeutic response, and safe medication prescription - remain under-explored. We advance the previously demonstrated diagnostic capabilities of the Articulate Medical Intelligence Explorer (AMIE) through a new LLM-based agentic system optimised for clinical management and dialogue, incorporating reasoning over the evolution of disease and multiple patient visit encounters, response to therapy, and professional competence in medication prescription. To ground its reasoning in authoritative clinical knowledge, AMIE leverages Gemini's long-context capabilities, combining in-context retrieval with structured reasoning to align its output with relevant and up-to-date clinical practice guidelines and drug formularies. In a randomized, blinded virtual Objective Structured Clinical Examination (OSCE) study, AMIE was compared to 21 primary care physicians (PCPs) across 100 multi-visit case scenarios designed to reflect UK NICE Guidance and BMJ Best Practice guidelines. AMIE was non-inferior to PCPs in management reasoning as assessed by specialist physicians and scored better in both preciseness of treatments and investigations, and in its alignment with and grounding of management plans in clinical guidelines. To benchmark medication reasoning, we developed RxQA, a multiple-choice question benchmark derived from two national drug formularies (US, UK) and validated by board-certified pharmacists. While AMIE and PCPs both benefited from the ability to access external drug information, AMIE outperformed PCPs on higher difficulty questions. While further research would be needed before real-world translation, AMIE's strong performance across evaluations marks a significant step towards conversational AI as a tool in disease management.
              
  
View details
          
        
      
    
        
          
            
              Conversational AI in health: Design considerations from a Wizard-of-Oz dermatology case study with users, clinicians and a medical LLM
            
          
        
        
          
            
              
                
                  
                    
    
    
    
    
    
                      
                        Brenna Li
                      
                    
                
              
            
              
                
                  
                    
                    
                      
                        Amy Wang
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Patricia Strachan
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Julie Anne Seguin
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Sami Lachgar
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Karyn Schroeder
                      
                    
                  
              
            
              
                
                  
                    
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Renee Wong
                      
                    
                  
              
            
              
                
                  
                    
                    
                  
              
            
              
                
                  
                    
                    
                  
              
            
              
                
                  
                    
                    
                  
              
            
              
                
                  
                    
                    
                  
              
            
              
                
                  
                    
                    
                  
              
            
              
                
                  
                    
                    
                  
              
            
              
                
                  
                    
                    
                  
              
            
              
                
                  
                    
                    
                  
              
            
              
                
                  
                    
                    
                  
              
            
              
                
                  
                    
                    
                  
              
            
          
          
          
          
            Extended Abstracts of the 2024 CHI Conference on Human Factors in Computing Systems, Association for Computing Machinery, pp. 10
          
          
        
        
        
          
              Preview abstract
          
          
              Although skin concerns are common, access to specialist care is limited. Artificial intelligence (AI)-assisted tools to support medical decisions may provide patients with feedback on their concerns while also helping ensure the most urgent cases are routed to dermatologists. Although AI-based conversational agents have been explored recently, how they are perceived by patients and clinicians is not well understood. We conducted a Wizard-of-Oz study involving 18 participants with real skin concerns. Participants were randomly assigned to interact with either a clinician agent (portrayed by a dermatologist) or an LLM agent (supervised by a dermatologist) via synchronous multimodal chat. In both conditions, participants found the conversation to be helpful in understanding their medical situation and alleviate their concerns. Through qualitative coding of the conversation transcripts, we provide insight on the importance of empathy and effective information-seeking. We conclude with design considerations for future AI-based conversational agents in healthcare settings.
              
  
View details
          
        
      
    
        
          
            
              Towards Conversational Diagnostic AI
            
          
        
        
          
            
              
                
                  
                    
                
              
            
              
                
                  
                    
                    
                  
              
            
              
                
                  
                    
                    
                  
              
            
              
                
                  
                    
                    
    
    
    
    
    
                      
                        Khaled Saab
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Jan Freyberg
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Ryutaro Tanno
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Amy Wang
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Brenna Li
                      
                    
                  
              
            
              
                
                  
                    
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Nenad Tomašev
                      
                    
                  
              
            
              
                
                  
                    
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Karan Singhal
                      
                    
                  
              
            
              
                
                  
                    
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Le Hou
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Albert Webson
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Kavita Kulkarni
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Sara Mahdavi
                      
                    
                  
              
            
              
                
                  
                    
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Juro Gottweis
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Joelle Barral
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Kat Chou
                      
                    
                  
              
            
              
                
                  
                    
                    
                  
              
            
              
                
                  
                    
                    
                  
              
            
              
                
                  
                    
                    
                  
              
            
              
                
                  
                    
                    
                  
              
            
          
          
          
          
            Arxiv (2024) (to appear)
          
          
        
        
        
          
              Preview abstract
          
          
              At the heart of medicine lies the physician-patient dialogue, where skillful history-taking paves the way for accurate diagnosis, effective management, and enduring trust. Artificial Intelligence (AI) systems capable of diagnostic dialogue could increase accessibility, consistency, and quality of care. However, approximating clinicians' expertise is an outstanding grand challenge. Here, we introduce AMIE (Articulate Medical Intelligence Explorer), a Large Language Model (LLM) based AI system optimized for diagnostic dialogue.
AMIE uses a novel self-play based simulated environment with automated feedback mechanisms for scaling learning across diverse disease conditions, specialties, and contexts. We designed a framework for evaluating clinically-meaningful axes of performance including history-taking, diagnostic accuracy, management reasoning, communication skills, and empathy. We compared AMIE's performance to that of primary care physicians (PCPs) in a randomized, double-blind crossover study of text-based consultations with validated patient actors in the style of an Objective Structured Clinical Examination (OSCE). The study included 149 case scenarios from clinical providers in Canada, the UK, and India, 20 PCPs for comparison with AMIE, and evaluations by specialist physicians and patient actors. AMIE demonstrated greater diagnostic accuracy and superior performance on 28 of 32 axes according to specialist physicians and 24 of 26 axes according to patient actors. Our research has several limitations and should be interpreted with appropriate caution. Clinicians were limited to unfamiliar synchronous text-chat which permits large-scale LLM-patient interactions but is not representative of usual clinical practice. While further research is required before AMIE could be translated to real-world settings, the results represent a milestone towards conversational diagnostic AI.
              
  
View details
          
        
      
    
        
          
            
              An intentional approach to managing bias in embedding models
            
          
        
        
          
            
              
                
                  
                    
                
              
            
              
                
                  
                    
                    
                  
              
            
              
                
                  
                    
                    
    
    
    
    
    
                      
                        Atilla P. Kiraly
                      
                    
                  
              
            
              
                
                  
                    
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Jungyeon Park
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Rory Pilgrim
                      
                    
                  
              
            
              
                
                  
                    
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Charles Lau
                      
                    
                  
              
            
              
                
                  
                    
                    
                  
              
            
              
                
                  
                    
                    
                  
              
            
              
                
                  
                    
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Heather Cole-Lewis
                      
                    
                  
              
            
              
                
                  
                    
                    
                  
              
            
              
                
                  
                    
                    
                  
              
            
              
                
                  
                    
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Shravya Shetty
                      
                    
                  
              
            
              
                
                  
                    
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Krish Eswaran
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Leo Anthony Celi
                      
                    
                  
              
            
              
                
                  
                    
                    
                  
              
            
          
          
          
          
            The Lancet Digital Health, 6 (2024), E126-E130
          
          
        
        
        
          
              Preview abstract
          
          
              Advances in machine learning for health care have brought concerns about bias from the research community; specifically, the introduction, perpetuation, or exacerbation of care disparities. Reinforcing these concerns is the finding that medical images often reveal signals about sensitive attributes in ways that are hard to pinpoint by both algorithms and people. This finding raises a question about how to best design general purpose pretrained embeddings (GPPEs, defined as embeddings meant to support a broad array of use cases) for building downstream models that are free from particular types of bias. The downstream model should be carefully evaluated for bias, and audited and improved as appropriate. However, in our view, well intentioned attempts to prevent the upstream components—GPPEs—from learning sensitive attributes can have unintended consequences on the downstream models. Despite producing a veneer of technical neutrality, the resultant end-to-end system might still be biased or poorly performing. We present reasons, by building on previously published data, to support the reasoning that GPPEs should ideally contain as much information as the original data contain, and highlight the perils of trying to remove sensitive attributes from a GPPE. We also emphasise that downstream prediction models trained for specific tasks and settings, whether developed using GPPEs or not, should be carefully designed and evaluated to avoid bias that makes models vulnerable to issues such as distributional shift. These evaluations should be done by a diverse team, including social scientists, on a diverse cohort representing the full breadth of the patient population for which the final model is intended.
              
  
View details
          
        
      
    
        
          
            
              Towards Generalist Biomedical AI
            
          
        
        
          
            
              
                
                  
                    
                
              
            
              
                
                  
                    
                    
                  
              
            
              
                
                  
                    
                    
    
    
    
    
    
                      
                        Danny Driess
                      
                    
                  
              
            
              
                
                  
                    
                    
                  
              
            
              
                
                  
                    
                    
                  
              
            
              
                
                  
                    
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Andrew Carroll
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Chuck Lau
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Ryutaro Tanno
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Ira Ktena
                      
                    
                  
              
            
              
                
                  
                    
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Basil Mustafa
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Aakanksha Chowdhery
                      
                    
                  
              
            
              
                
                  
                    
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Simon Kornblith
                      
                    
                  
              
            
              
                
                  
                    
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Philip Mansfield
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Sushant Prakash
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Renee Wong
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Sunny Virmani
                      
                    
                  
              
            
              
                
                  
                    
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Sara Mahdavi
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Bradley Green
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Ewa Dominowska
                      
                    
                  
              
            
              
                
                  
                    
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Joelle Barral
                      
                    
                  
              
            
              
                
                  
                    
                    
                  
              
            
              
                
                  
                    
                    
                  
              
            
              
                
                  
                    
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Karan Singhal
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Pete Florence
                      
                    
                  
              
            
              
                
                  
                    
                    
                  
              
            
              
                
                  
                    
                    
                  
              
            
          
          
          
          
            NEJM AI (2024)
          
          
        
        
        
          
              Preview abstract
          
          
              BACKGROUND: Medicine is inherently multimodal, requiring the simultaneous interpretation and integration of insights between many data modalities spanning text, imaging, genomics, and more. Generalist biomedical artificial intelligence systems that flexibly encode, integrate, and interpret these data might better enable impactful applications ranging from scientific discovery to care delivery.
METHODS: To catalyze development of these models, we curated MultiMedBench, a new multimodal biomedical benchmark. MultiMedBench encompasses 14 diverse tasks, such as medical question answering, mammography and dermatology image interpretation, radiology report generation and summarization, and genomic variant calling. We then introduced Med-PaLM Multimodal (Med-PaLM M), our proof of concept for a generalist biomedical AI system that flexibly encodes and interprets biomedical data including clinical language, imaging, and genomics with the same set of model weights. To further probe the capabilities and limitations of Med-PaLM M, we conducted a radiologist evaluation of model-generated (and human) chest x-ray reports.
RESULTS: We observed encouraging performance across model scales. Med-PaLM M reached performance competitive with or exceeding the state of the art on all MultiMedBench tasks, often surpassing specialist models by a wide margin. In a side-by-side ranking on 246 retrospective chest x-rays, clinicians expressed a pairwise preference for Med-PaLM Multimodal reports over those produced by radiologists in up to 40.50% of cases, suggesting potential clinical utility.
CONCLUSIONS: Although considerable work is needed to validate these models in real-world cases and understand if cross-modality generalization is possible, our results represent a milestone toward the development of generalist biomedical artificial intelligence systems. 
              
  
View details
          
        
      
    
        
        
          
              Preview abstract
          
          
              Artificial Intelligence (AI) holds the promise of transforming healthcare by improving patient outcomes, increasing accessibility and efficiency, and decreasing the cost of care. Realizing this vision of a healthier world for everyone everywhere requires partnerships and trust between healthcare systems, clinicians, payers, technology companies, pharmaceutical companies, and governments to drive innovations in machine learning and artificial intelligence to patients.  Google is one example of a technology company that is partnering with healthcare systems, clinicians, and researchers to develop technology solutions that will directly improve the lives of patients. In this chapter we share landmark trials of the use of AI in healthcare. We also describe the application of our novel system of organizing information to unify data in electronic health records (EHRs) and bring an integrated view of patient records to clinicians. We discuss our consumer focused innovation in dermatology to help guide search journeys for personalized information about skin conditions. Finally, we share a perspective on how to embed ethics and a concern for all patients into the development of AI.
              
  
View details
          
        
      
    
        
          
            
              Towards Physician-Level Medical Question Answering with Large Language Models
            
          
        
        
          
            
              
                
                  
                    
    
    
    
    
    
                      
                        Karan Singhal
                      
                    
                
              
            
              
                
                  
                    
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Juro Gottweis
                      
                    
                  
              
            
              
                
                  
                    
                    
                  
              
            
              
                
                  
                    
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Le Hou
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Kevin Clark
                      
                    
                  
              
            
              
                
                  
                    
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Heather Cole-Lewis
                      
                    
                  
              
            
              
                
                  
                    
                    
                  
              
            
              
                
                  
                    
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Amy Wang
                      
                    
                  
              
            
              
                
                  
                    
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Sami Lachgar
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Philip Mansfield
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Sushant Prakash
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Bradley Green
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Ewa Dominowska
                      
                    
                  
              
            
              
                
                  
                    
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Nenad Tomašev
                      
                    
                  
              
            
              
                
                  
                    
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Renee Wong
                      
                    
                  
              
            
              
                
                  
                    
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Sara Mahdavi
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Joelle Barral
                      
                    
                  
              
            
              
                
                  
                    
                    
                  
              
            
              
                
                  
                    
                    
                  
              
            
              
                
                  
                    
                    
                  
              
            
              
                
                  
                    
                    
                  
              
            
              
                
                  
                    
                    
                  
              
            
              
                
                  
                    
                    
                  
              
            
          
          
          
          
            Arxiv (2023) (to appear)
          
          
        
        
        
          
              Preview abstract
          
          
              Recent artificial intelligence (AI) systems have reached milestones in "grand challenges" ranging from Go to protein-folding. The capability to retrieve medical knowledge, reason over it, and answer medical questions comparably to physicians has long been viewed as one such grand challenge.
Large language models (LLMs) have catalyzed significant progress in medical question answering; Med-PaLM was the first model to exceed a "passing" score in US Medical Licensing Examination (USMLE) style questions with a score of 67.2% on the MedQA dataset. However, this and other prior work suggested significant room for improvement, especially when models' answers were compared to clinicians' answers. Here we present Med-PaLM 2, which bridges these gaps by leveraging a combination of base LLM improvements (PaLM 2), medical domain finetuning, and prompting strategies including a novel ensemble refinement approach.
Med-PaLM 2 scored up to 86.5% on the MedQA dataset, improving upon Med-PaLM by over 19% and setting a new state-of-the-art. We also observed performance approaching or exceeding state-of-the-art across MedMCQA, PubMedQA, and MMLU clinical topics datasets.
We performed detailed human evaluations on long-form questions along multiple axes relevant to clinical applications. In pairwise comparative ranking of 1066 consumer medical questions, physicians preferred Med-PaLM 2 answers to those produced by physicians on eight of nine axes pertaining to clinical utility (p < 0.001). We also observed significant improvements compared to Med-PaLM on every evaluation axis (p < 0.001) on newly introduced datasets of 240 long-form "adversarial" questions to probe LLM limitations.
While further studies are necessary to validate the efficacy of these models in real-world settings, these results highlight rapid progress towards physician-level performance in medical question answering.
              
  
View details
          
        
      
    
        
          
            
              Consensus, dissensus and synergy between clinicians and specialist foundation models in radiology report generation
            
          
        
        
          
            
              
                
                  
                    
    
    
    
    
    
                      
                        Ryutaro Tanno
                      
                    
                
              
            
              
                
                  
                    
                    
                      
                        David Barrett
                      
                    
                  
              
            
              
                
                  
                    
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Sumedh Ghaisas
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Sumanth Dathathri
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Abi See
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Johannes Welbl
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Karan Singhal
                      
                    
                  
              
            
              
                
                  
                    
                    
                  
              
            
              
                
                  
                    
                    
                  
              
            
              
                
                  
                    
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Rhys May
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Roy Lee
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        SiWai Man
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Zahra Ahmed
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Sara Mahdavi
                      
                    
                  
              
            
              
                
                  
                    
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Joelle Barral
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Ali Eslami
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Danielle Belgrave
                      
                    
                  
              
            
              
                
                  
                    
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Shravya Shetty
                      
                    
                  
              
            
              
                
                  
                    
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Po-Sen Huang
                      
                    
                  
              
            
              
                
                  
                    
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Ira Ktena
                      
                    
                  
              
            
          
          
          
          
            Arxiv (2023)
          
          
        
        
        
          
              Preview abstract
          
          
              Radiology reports are an instrumental part of modern medicine, informing key clinical decisions such as diagnosis and treatment. The worldwide shortage of radiologists, however, restricts access to expert care and imposes heavy workloads, contributing to avoidable errors and delays in report delivery. While recent progress in automated report generation with vision-language models offer clear potential in ameliorating the situation, the path to real-world adoption has been stymied by the challenge of evaluating the clinical quality of AI-generated reports. In this study, we build a state-of-the-art report generation system for chest radiographs, Flamingo-CXR, by fine-tuning a well-known vision-language foundation model on radiology data. To evaluate the quality of the AI-generated reports, a group of 16 certified radiologists provide detailed evaluations of AI-generated and human written reports for chest X-rays from an intensive care setting in the United States and an inpatient setting in India. At least one radiologist (out of two per case) preferred the AI report to the ground truth report in over 60% of cases for both datasets. Amongst the subset of AI-generated reports that contain errors, the most frequently cited reasons were related to the location and finding, whereas for human written reports, most mistakes were related to severity and finding. This disparity suggested potential complementarity between our AI system and human experts, prompting us to develop an assistive scenario in which Flamingo-CXR generates a first-draft report, which is subsequently revised by a clinician. This is the first demonstration of clinician-AI collaboration for report writing, and the resultant reports are assessed to be equivalent or preferred by at least one radiologist to reports written by experts alone in 80% of in-patient cases and 60% of intensive care cases.
              
  
View details
          
        
      
    
        
          
            
              Enhancing diagnostic accuracy of medical AI systems via selective deferral to clinicians
            
          
        
        
          
            
              
                
                  
                    
    
    
    
    
    
                      
                        Dj Dvijotham
                      
                    
                
              
            
              
                
                  
                    
                    
                      
                        Jim Winkens
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Melih Barsbey
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Sumedh Ghaisas
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Robert Stanforth
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Nick Pawlowski
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Patricia Strachan
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Zahra Ahmed
                      
                    
                  
              
            
              
                
                  
                    
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Yoram Bachrach
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Laura Culp
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Mayank Daswani
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Jan Freyberg
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Christopher Kelly
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Atilla Kiraly
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Timo Kohlberger
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Scott Mayer McKinney
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Basil Mustafa
                      
                    
                  
              
            
              
                
                  
                    
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Krzysztof Geras
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Jan Witowski
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Zhi Zhen Qin
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Jacob Creswell
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Shravya Shetty
                      
                    
                  
              
            
              
                
                  
                    
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Terry Spitz
                      
                    
                  
              
            
              
                
                  
                    
                    
                  
              
            
              
                
                  
                    
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Taylan Cemgil
                      
                    
                  
              
            
              
                
                  
                    
                    
                  
              
            
          
          
          
          
            Nature Medicine (2023)
          
          
        
        
        
          
              Preview abstract
          
          
              AI systems trained using deep learning have been shown to achieve expert-level identification of diseases in multiple medical imaging settings1,2. While these results are impressive, they don’t accurately reflect the impact of deployment of such systems in a clinical context. Due to the safety-critical nature of this domain and the fact that AI systems are not perfect and can make inaccurate assessments, they are predominantly deployed as assistive tools for clinical experts3. Although clinicians routinely discuss the diagnostic nuances of medical images with each other, weighing human diagnostic confidence against that of an AI system remains a major unsolved barrier to collaborative decision-making4. Furthermore, it has been observed that diagnostic AI models have complementary strengths and weaknesses compared to clinical experts. Yet, complementarity and the assessment of relative confidence between the members of a diagnostic team has remained largely unexploited in how AI systems are currently used in medical settings5.
In this paper, we study the behavior of a team composed of diagnostic AI model(s) and clinician(s) in diagnosing disease. To go beyond the performance level of a standalone AI system, we develop a novel selective deferral algorithm that can learn to decide when to rely on a diagnostic AI model and when to defer to a clinical expert. Using this algorithm, we demonstrate that the composite AI+human system has enhanced accuracy (both sensitivity and specificity) relative to a human-only or an AI-only baseline. We decouple the development of the deferral AI model from training of the underlying diagnostic AI model(s). Development of the deferral AI model only requires i) the predictions of a model(s) on a tuning set of medical images (separate from the diagnostic AI models’ training data), ii) the diagnoses made by clinicians on these images and iii) the ground truth disease labels corresponding to those images.
Our extensive analysis shows that the selective deferral (SD) system exceeds the performance of either clinicians or AI alone in multiple clinical settings: breast and lung cancer screening. For breast cancer screening, double-reading with arbitration (two readers interpreting each mammogram invoking an arbitrator if needed) is a “gold standard” for performance, never previously exceeded using AI6. The SD system exceeds the accuracy of double-reading with arbitration in a large representative UK screening program (25% reduction in false positives despite equivalent true-positive detection and 66% reduction in the requirement for clinicians to read an image), as well as exceeding the performance of a standalone state-of-art AI system (40% reduction in false positives with equivalent detection of true positives). In a large US dataset the SD system exceeds the accuracy of single-reading by board-certified radiologists and a standalone state-of-art AI system (32% reduction in false positives despite equivalent detection of true positives and 55% reduction in the clinician workload required). The SD system further outperforms both clinical experts alone, and AI alone for the detection of lung cancer in low-dose Computed Tomography images from a large national screening study, with 11% reduction in false positives while maintaining sensitivity given 93% reduction in clinician workload required.  Furthermore, the SD system allows controllable trade-offs between sensitivity and specificity and can be tuned to target either specificity or sensitivity as desired for a particular clinical application, or a combination of both.
The system generalizes to multiple distribution shifts, retaining superiority to both the AI system alone and human experts alone. We demonstrate that the SD system retains performance gains even on clinicians not present in the training data for the deferral AI. Furthermore, we test the SD system on a new population where the standalone AI system’s performance significantly degrades. We showcase the few-shot adaptation capability of the SD system by demonstrating that the SD system can obtain superiority to both the standalone AI system and the clinician on the new population after being trained on only 40 cases from the new population. 
Our comprehensive assessment demonstrates that a selective deferral system could significantly improve clinical outcomes in multiple medical imaging applications, paving the way for higher performance clinical AI systems that can leverage the complementarity between clinical experts and medical AI tools.
              
  
View details
          
        
      
    
        
          
            
              Large Language Models Encode Clinical Knowledge
            
          
        
        
          
            
              
                
                  
                    
    
    
    
    
    
                      
                        Karan Singhal
                      
                    
                
              
            
              
                
                  
                    
                    
                  
              
            
              
                
                  
                    
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Sara Mahdavi
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Jason Wei
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Hyung Won Chung
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Nathan Scales
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Ajay Tanwani
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Heather Cole-Lewis
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Perry Payne
                      
                    
                  
              
            
              
                
                  
                    
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Martin Seneviratne
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Paul Gamble
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Christopher Kelly
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Abubakr Abdelrazig Hassan Babiker
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Nathanael Schaerli
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Aakanksha Chowdhery
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Philip Mansfield
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Dina Demner-Fushman
                      
                    
                  
              
            
              
                
                  
                    
                    
                  
              
            
              
                
                  
                    
                    
                  
              
            
              
                
                  
                    
                    
                  
              
            
              
                
                  
                    
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Katherine Chou
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Juraj Gottweis
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Nenad Tomašev
                      
                    
                  
              
            
              
                
                  
                    
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Alvin Rajkomar
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Joelle Barral
                      
                    
                  
              
            
              
                
                  
                    
                    
                  
              
            
              
                
                  
                    
                    
                  
              
            
              
                
                  
                    
                    
                  
              
            
          
          
          
          
            Nature (2023)
          
          
        
        
        
          
              Preview abstract
          
          
              Large language models (LLMs) have demonstrated impressive capabilities, but the bar for clinical applications is high. Attempts to assess the clinical knowledge of models typically rely on automated evaluations based on limited benchmarks. Here, to address these limitations, we present MultiMedQA, a benchmark combining six existing medical question answering datasets spanning professional medicine, research and consumer queries and a new dataset of medical questions searched online, HealthSearchQA. We propose a human evaluation framework for model answers along multiple axes including factuality, comprehension, reasoning, possible harm and bias. In addition, we evaluate Pathways Language Model (PaLM, a 540-billion parameter LLM) and its instruction-tuned variant, Flan-PaLM on MultiMedQA. Using a combination of prompting strategies, Flan-PaLM achieves state-of-the-art accuracy on every MultiMedQA multiple-choice dataset (MedQA, MedMCQA, PubMedQA and Measuring Massive Multitask Language Understanding (MMLU) clinical topics), including 67.6% accuracy on MedQA (US Medical Licensing Exam-style questions), surpassing the prior state of the art by more than 17%. However, human evaluation reveals key gaps. To resolve this, we introduce instruction prompt tuning, a parameter-efficient approach for aligning LLMs to new domains using a few exemplars. The resulting model, Med-PaLM, performs encouragingly, but remains inferior to clinicians. We show that comprehension, knowledge recall and reasoning improve with model scale and instruction prompt tuning, suggesting the potential utility of LLMs in medicine. Our human evaluations reveal limitations of today’s models, reinforcing the importance of both evaluation frameworks and method development in creating safe, helpful LLMs for clinical applications.
              
  
View details
          
        
      
    