Shekoofeh Azizi
            I am a staff research scientist and a research lead at Google DeepMind. My research is focused on creating a new paradigm of biomedical super-intelligence to accelerate scientific discovery, with a particular focus on cancer.   To this end, my team has been pioneering the application of the LLMs in new scientific frontiers including therapeutics and single-cell biology such as TxGemma, Tx-LLM, and C2S-Scale.   I am also one of the research leads driving the ambitious effort behind the creation Med-PaLM series and Med-Gemini which are Google's flagship LLMs, meticulously designed for medical applications. 
          
        
        Research Areas
      Authored Publications
    
  
  
  
    
    
  
      
        Sort By
        
        
    
    
        
          
            
              Scaling Large Language Models For Next-Generation Single-Cell Analysis
            
          
        
        
          
            
              
                
                  
                    
    
    
    
    
    
                      
                        Syed Asad Rizvi
                      
                    
                
              
            
              
                
                  
                    
                    
                      
                        Daniel Levine
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Aakash Patel
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Shiyang Zhang
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Eric Wang
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Curtis Jamison Perry
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Nicole Mayerli Constante
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Sizhuang He
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        David Zhang
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Cerise Tang
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Zhuoyang Lyu
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Rayyan Darji
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Chang Li
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Emily Sun
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        David Jeong
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Lawrence Zhao
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Jennifer Kwan
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        David Braun
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Brian Hafler
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Hattie Chung
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Rahul M. Dhodapkar
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Paul Jaeger
                      
                    
                  
              
            
              
                
                  
                    
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Jeffrey Ishizuka
                      
                    
                  
              
            
              
                
                  
                    
                    
                  
              
            
              
                
                  
                    
                    
                      
                        David van Dijk
                      
                    
                  
              
            
          
          
          
          
            biorxiv (2025)
          
          
        
        
        
          
              Preview abstract
          
          
              Single-cell RNA sequencing has transformed our understanding of cellular diversity, yet current singlecell foundation models (scFMs) remain limited in their scalability, flexibility across diverse tasks, and ability to natively integrate textual information. In this work, we build upon the Cell2Sentence (C2S) framework, which represents scRNA-seq profiles as textual “cell sentences,” to train Large Language Models (LLMs) on a corpus comprising over one billion tokens of transcriptomic data, biological text, and metadata. Scaling the model to 27 billion parameters yields consistent improvements in predictive and generative capabilities and supports advanced downstream tasks that require synthesis of information across multi-cellular contexts. Targeted fine-tuning with modern reinforcement learning techniques produces strong performance in perturbation response prediction, natural language interpretation, and complex biological reasoning. This predictive strength directly enabled a dualcontext virtual screen that uncovered a striking context split for the kinase inhibitor silmitasertib (CX-4945), suggesting its potential as a synergistic, interferon-conditional amplifier of antigen presentation. Experimental validation in human cell models unseen during training confirmed this hypothesis, demonstrating that C2S-Scale can generate biologically grounded, testable discoveries of context-conditioned biology. C2S-Scale unifies transcriptomic and textual data at unprecedented scales, surpassing both specialized single-cell models and general-purpose LLMs to provide a platform for next-generation single-cell analysis and the development of “virtual cells.”
              
  
View details
          
        
      
    
        
          
            
              Health AI Developer Foundations
            
          
        
        
          
            
              
                
                  
                    
    
    
    
    
    
                      
                        Atilla Kiraly
                      
                    
                
              
            
              
                
                  
                    
                    
                      
                        Sebastien Baur
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Kenneth Philbrick
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Fereshteh Mahvar
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Liron Yatziv
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Tiffany Chen
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Bram Sterling
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Nick George
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Fayaz Jamil
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Jing Tang
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Kai Bailey
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Faruk Ahmed
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Akshay Goel
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Abbi Ward
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Lin Yang
                      
                    
                  
              
            
              
                
                  
                    
                    
                  
              
            
              
                
                  
                    
                    
                  
              
            
              
                
                  
                    
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Shravya Shetty
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Daniel Golden
                      
                    
                  
              
            
              
                
                  
                    
                    
                  
              
            
              
                
                  
                    
                    
                  
              
            
              
                
                  
                    
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Tim Thelin
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Rory Pilgrim
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Can "John" Kirmizi
                      
                    
                  
              
            
          
          
          
          
            arXiv (2024)
          
          
        
        
        
          
              Preview abstract
          
          
              Robust medical Machine Learning (ML) models have the potential to revolutionize healthcare by accelerating clinical research, improving workflows and outcomes, and producing novel insights or capabilities. Developing such ML models from scratch is cost prohibitive and requires substantial compute, data, and time (e.g., expert labeling). To address these challenges, we introduce Health AI Developer Foundations (HAI-DEF), a suite of pre-trained, domain-specific foundation models, tools, and recipes to accelerate building ML for health applications. The models cover various modalities and domains, including radiology (X-rays and computed tomography), histopathology, dermatological imaging, and audio. These models provide domain specific embeddings that facilitate AI development with less labeled data, shorter training times, and reduced computational costs compared to traditional approaches. In addition, we utilize a common interface and style across these models, and prioritize usability to enable developers to integrate HAI-DEF efficiently. We present model evaluations across various tasks and conclude with a discussion of their application and evaluation, covering the importance of ensuring efficacy, fairness, and equity. Finally, while HAI-DEF and specifically the foundation models lower the barrier to entry for ML in healthcare, we emphasize the importance of validation with problem- and population-specific data for each desired usage setting. This technical report will be updated over time as more modalities and features are added.
              
  
View details
          
        
      
    
        
          
            
              An intentional approach to managing bias in embedding models
            
          
        
        
          
            
              
                
                  
                    
                
              
            
              
                
                  
                    
                    
                  
              
            
              
                
                  
                    
                    
    
    
    
    
    
                      
                        Atilla P. Kiraly
                      
                    
                  
              
            
              
                
                  
                    
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Jungyeon Park
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Rory Pilgrim
                      
                    
                  
              
            
              
                
                  
                    
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Charles Lau
                      
                    
                  
              
            
              
                
                  
                    
                    
                  
              
            
              
                
                  
                    
                    
                  
              
            
              
                
                  
                    
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Heather Cole-Lewis
                      
                    
                  
              
            
              
                
                  
                    
                    
                  
              
            
              
                
                  
                    
                    
                  
              
            
              
                
                  
                    
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Shravya Shetty
                      
                    
                  
              
            
              
                
                  
                    
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Krish Eswaran
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Leo Anthony Celi
                      
                    
                  
              
            
              
                
                  
                    
                    
                  
              
            
          
          
          
          
            The Lancet Digital Health, 6 (2024), E126-E130
          
          
        
        
        
          
              Preview abstract
          
          
              Advances in machine learning for health care have brought concerns about bias from the research community; specifically, the introduction, perpetuation, or exacerbation of care disparities. Reinforcing these concerns is the finding that medical images often reveal signals about sensitive attributes in ways that are hard to pinpoint by both algorithms and people. This finding raises a question about how to best design general purpose pretrained embeddings (GPPEs, defined as embeddings meant to support a broad array of use cases) for building downstream models that are free from particular types of bias. The downstream model should be carefully evaluated for bias, and audited and improved as appropriate. However, in our view, well intentioned attempts to prevent the upstream components—GPPEs—from learning sensitive attributes can have unintended consequences on the downstream models. Despite producing a veneer of technical neutrality, the resultant end-to-end system might still be biased or poorly performing. We present reasons, by building on previously published data, to support the reasoning that GPPEs should ideally contain as much information as the original data contain, and highlight the perils of trying to remove sensitive attributes from a GPPE. We also emphasise that downstream prediction models trained for specific tasks and settings, whether developed using GPPEs or not, should be carefully designed and evaluated to avoid bias that makes models vulnerable to issues such as distributional shift. These evaluations should be done by a diverse team, including social scientists, on a diverse cohort representing the full breadth of the patient population for which the final model is intended.
              
  
View details
          
        
      
    
        
          
            
              Towards Generalist Biomedical AI
            
          
        
        
          
            
              
                
                  
                    
                
              
            
              
                
                  
                    
                    
                  
              
            
              
                
                  
                    
                    
    
    
    
    
    
                      
                        Danny Driess
                      
                    
                  
              
            
              
                
                  
                    
                    
                  
              
            
              
                
                  
                    
                    
                  
              
            
              
                
                  
                    
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Andrew Carroll
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Chuck Lau
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Ryutaro Tanno
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Ira Ktena
                      
                    
                  
              
            
              
                
                  
                    
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Basil Mustafa
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Aakanksha Chowdhery
                      
                    
                  
              
            
              
                
                  
                    
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Simon Kornblith
                      
                    
                  
              
            
              
                
                  
                    
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Philip Mansfield
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Sushant Prakash
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Renee Wong
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Sunny Virmani
                      
                    
                  
              
            
              
                
                  
                    
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Sara Mahdavi
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Bradley Green
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Ewa Dominowska
                      
                    
                  
              
            
              
                
                  
                    
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Joelle Barral
                      
                    
                  
              
            
              
                
                  
                    
                    
                  
              
            
              
                
                  
                    
                    
                  
              
            
              
                
                  
                    
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Karan Singhal
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Pete Florence
                      
                    
                  
              
            
              
                
                  
                    
                    
                  
              
            
              
                
                  
                    
                    
                  
              
            
          
          
          
          
            NEJM AI (2024)
          
          
        
        
        
          
              Preview abstract
          
          
              BACKGROUND: Medicine is inherently multimodal, requiring the simultaneous interpretation and integration of insights between many data modalities spanning text, imaging, genomics, and more. Generalist biomedical artificial intelligence systems that flexibly encode, integrate, and interpret these data might better enable impactful applications ranging from scientific discovery to care delivery.
METHODS: To catalyze development of these models, we curated MultiMedBench, a new multimodal biomedical benchmark. MultiMedBench encompasses 14 diverse tasks, such as medical question answering, mammography and dermatology image interpretation, radiology report generation and summarization, and genomic variant calling. We then introduced Med-PaLM Multimodal (Med-PaLM M), our proof of concept for a generalist biomedical AI system that flexibly encodes and interprets biomedical data including clinical language, imaging, and genomics with the same set of model weights. To further probe the capabilities and limitations of Med-PaLM M, we conducted a radiologist evaluation of model-generated (and human) chest x-ray reports.
RESULTS: We observed encouraging performance across model scales. Med-PaLM M reached performance competitive with or exceeding the state of the art on all MultiMedBench tasks, often surpassing specialist models by a wide margin. In a side-by-side ranking on 246 retrospective chest x-rays, clinicians expressed a pairwise preference for Med-PaLM Multimodal reports over those produced by radiologists in up to 40.50% of cases, suggesting potential clinical utility.
CONCLUSIONS: Although considerable work is needed to validate these models in real-world cases and understand if cross-modality generalization is possible, our results represent a milestone toward the development of generalist biomedical artificial intelligence systems. 
              
  
View details
          
        
      
    
        
          
            
              Towards Conversational Diagnostic AI
            
          
        
        
          
            
              
                
                  
                    
                
              
            
              
                
                  
                    
                    
                  
              
            
              
                
                  
                    
                    
                  
              
            
              
                
                  
                    
                    
    
    
    
    
    
                      
                        Khaled Saab
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Jan Freyberg
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Ryutaro Tanno
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Amy Wang
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Brenna Li
                      
                    
                  
              
            
              
                
                  
                    
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Nenad Tomašev
                      
                    
                  
              
            
              
                
                  
                    
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Karan Singhal
                      
                    
                  
              
            
              
                
                  
                    
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Le Hou
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Albert Webson
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Kavita Kulkarni
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Sara Mahdavi
                      
                    
                  
              
            
              
                
                  
                    
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Juro Gottweis
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Joelle Barral
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Kat Chou
                      
                    
                  
              
            
              
                
                  
                    
                    
                  
              
            
              
                
                  
                    
                    
                  
              
            
              
                
                  
                    
                    
                  
              
            
              
                
                  
                    
                    
                  
              
            
          
          
          
          
            Arxiv (2024) (to appear)
          
          
        
        
        
          
              Preview abstract
          
          
              At the heart of medicine lies the physician-patient dialogue, where skillful history-taking paves the way for accurate diagnosis, effective management, and enduring trust. Artificial Intelligence (AI) systems capable of diagnostic dialogue could increase accessibility, consistency, and quality of care. However, approximating clinicians' expertise is an outstanding grand challenge. Here, we introduce AMIE (Articulate Medical Intelligence Explorer), a Large Language Model (LLM) based AI system optimized for diagnostic dialogue.
AMIE uses a novel self-play based simulated environment with automated feedback mechanisms for scaling learning across diverse disease conditions, specialties, and contexts. We designed a framework for evaluating clinically-meaningful axes of performance including history-taking, diagnostic accuracy, management reasoning, communication skills, and empathy. We compared AMIE's performance to that of primary care physicians (PCPs) in a randomized, double-blind crossover study of text-based consultations with validated patient actors in the style of an Objective Structured Clinical Examination (OSCE). The study included 149 case scenarios from clinical providers in Canada, the UK, and India, 20 PCPs for comparison with AMIE, and evaluations by specialist physicians and patient actors. AMIE demonstrated greater diagnostic accuracy and superior performance on 28 of 32 axes according to specialist physicians and 24 of 26 axes according to patient actors. Our research has several limitations and should be interpreted with appropriate caution. Clinicians were limited to unfamiliar synchronous text-chat which permits large-scale LLM-patient interactions but is not representative of usual clinical practice. While further research is required before AMIE could be translated to real-world settings, the results represent a milestone towards conversational diagnostic AI.
              
  
View details
          
        
      
    
        
          
            
              A Toolbox for Surfacing Health Equity Harms and Biases in Large Language Models
            
          
        
        
          
            
              
                
                  
                    
                
              
            
              
                
                  
                    
                    
    
    
    
    
    
                      
                        Heather Cole-Lewis
                      
                    
                  
              
            
              
                
                  
                    
                    
                  
              
            
              
                
                  
                    
                    
                  
              
            
              
                
                  
                    
                    
                  
              
            
              
                
                  
                    
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Nenad Tomašev
                      
                    
                  
              
            
              
                
                  
                    
                    
                  
              
            
              
                
                  
                    
                    
                  
              
            
              
                
                  
                    
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Liam McCoy
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Leo Anthony Celi
                      
                    
                  
              
            
              
                
                  
                    
                    
                  
              
            
              
                
                  
                    
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Alanna Walton
                      
                    
                  
              
            
              
                
                  
                    
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Chirag Nagpal
                      
                    
                  
              
            
              
                
                  
                    
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Akeiylah DeWitt
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Philip Mansfield
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Sushant Prakash
                      
                    
                  
              
            
              
                
                  
                    
                    
                  
              
            
              
                
                  
                    
                    
                  
              
            
              
                
                  
                    
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Joelle Barral
                      
                    
                  
              
            
              
                
                  
                    
                    
                  
              
            
              
                
                  
                    
                    
                  
              
            
              
                
                  
                    
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Ivor Horn
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Karan Singhal
                      
                    
                  
              
            
          
          
          
          
            Nature Medicine (2024)
          
          
        
        
        
          
              Preview abstract
          
          
              Large language models (LLMs) hold promise to serve complex health information needs but also have the potential to introduce harm and exacerbate health disparities. Reliably evaluating equity-related model failures is a critical step toward developing systems that promote health equity. We present resources and methodologies for surfacing biases with potential to precipitate equity-related harms in long-form, LLM-generated answers to medical questions and conduct a large-scale empirical case study with the Med-PaLM 2 LLM. Our contributions include a multifactorial framework for human assessment of LLM-generated answers for biases and EquityMedQA, a collection of seven datasets enriched for adversarial queries. Both our human assessment framework and our dataset design process are grounded in an iterative participatory approach and review of Med-PaLM 2 answers. Through our empirical study, we find that our approach surfaces biases that may be missed by narrower evaluation approaches. Our experience underscores the importance of using diverse assessment methodologies and involving raters of varying backgrounds and expertise. While our approach is not sufficient to holistically assess whether the deployment of an artificial intelligence (AI) system promotes equitable health outcomes, we hope that it can be leveraged and built upon toward a shared goal of LLMs that promote accessible and equitable healthcare.
              
  
View details
          
        
      
    
        
          
            
              Generative models improve fairness of medical classifiers under distribution shifts
            
          
        
        
          
            
              
                
                  
                    
    
    
    
    
    
                      
                        Ira Ktena
                      
                    
                
              
            
              
                
                  
                    
                    
                      
                        Olivia Wiles
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Isabela Albuquerque
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Sylvestre-Alvise Rebuffi
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Ryutaro Tanno
                      
                    
                  
              
            
              
                
                  
                    
                    
                  
              
            
              
                
                  
                    
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Danielle Belgrave
                      
                    
                  
              
            
              
                
                  
                    
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Taylan Cemgil
                      
                    
                  
              
            
              
                
                  
                    
                    
                  
              
            
              
                
                  
                    
                    
                  
              
            
          
          
          
          
            Nature Medicine (2024)
          
          
        
        
        
          
              Preview abstract
          
          
              Domain generalization is a ubiquitous challenge for machine learning in healthcare. Model performance in real-world conditions might be lower than expected because of discrepancies between the data encountered during deployment and development. Underrepresentation of some groups or conditions during model development is a common cause of this phenomenon. This challenge is often not readily addressed by targeted data acquisition and ‘labeling’ by expert clinicians, which can be prohibitively expensive or practically impossible because of the rarity of conditions or the available clinical expertise. We hypothesize that advances in generative artificial intelligence can help mitigate this unmet need in a steerable fashion, enriching our training dataset with synthetic examples that address shortfalls of underrepresented conditions or subgroups. We show that diffusion models can automatically learn realistic augmentations from data in a label-efficient manner. We demonstrate that learned augmentations make models more robust and statistically fair in-distribution and out of distribution. To evaluate the generality of our approach, we studied three distinct medical imaging contexts of varying difficulty: (1) histopathology, (2) chest X-ray and (3) dermatology images. Complementing real samples with synthetic ones improved the robustness of models in all three medical tasks and increased fairness by improving the accuracy of clinical diagnosis within underrepresented groups, especially out of distribution.
              
  
View details
          
        
      
    
        
          
            
              Enhancing diagnostic accuracy of medical AI systems via selective deferral to clinicians
            
          
        
        
          
            
              
                
                  
                    
    
    
    
    
    
                      
                        Dj Dvijotham
                      
                    
                
              
            
              
                
                  
                    
                    
                      
                        Jim Winkens
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Melih Barsbey
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Sumedh Ghaisas
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Robert Stanforth
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Nick Pawlowski
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Patricia Strachan
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Zahra Ahmed
                      
                    
                  
              
            
              
                
                  
                    
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Yoram Bachrach
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Laura Culp
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Mayank Daswani
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Jan Freyberg
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Christopher Kelly
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Atilla Kiraly
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Timo Kohlberger
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Scott Mayer McKinney
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Basil Mustafa
                      
                    
                  
              
            
              
                
                  
                    
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Krzysztof Geras
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Jan Witowski
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Zhi Zhen Qin
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Jacob Creswell
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Shravya Shetty
                      
                    
                  
              
            
              
                
                  
                    
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Terry Spitz
                      
                    
                  
              
            
              
                
                  
                    
                    
                  
              
            
              
                
                  
                    
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Taylan Cemgil
                      
                    
                  
              
            
              
                
                  
                    
                    
                  
              
            
          
          
          
          
            Nature Medicine (2023)
          
          
        
        
        
          
              Preview abstract
          
          
              AI systems trained using deep learning have been shown to achieve expert-level identification of diseases in multiple medical imaging settings1,2. While these results are impressive, they don’t accurately reflect the impact of deployment of such systems in a clinical context. Due to the safety-critical nature of this domain and the fact that AI systems are not perfect and can make inaccurate assessments, they are predominantly deployed as assistive tools for clinical experts3. Although clinicians routinely discuss the diagnostic nuances of medical images with each other, weighing human diagnostic confidence against that of an AI system remains a major unsolved barrier to collaborative decision-making4. Furthermore, it has been observed that diagnostic AI models have complementary strengths and weaknesses compared to clinical experts. Yet, complementarity and the assessment of relative confidence between the members of a diagnostic team has remained largely unexploited in how AI systems are currently used in medical settings5.
In this paper, we study the behavior of a team composed of diagnostic AI model(s) and clinician(s) in diagnosing disease. To go beyond the performance level of a standalone AI system, we develop a novel selective deferral algorithm that can learn to decide when to rely on a diagnostic AI model and when to defer to a clinical expert. Using this algorithm, we demonstrate that the composite AI+human system has enhanced accuracy (both sensitivity and specificity) relative to a human-only or an AI-only baseline. We decouple the development of the deferral AI model from training of the underlying diagnostic AI model(s). Development of the deferral AI model only requires i) the predictions of a model(s) on a tuning set of medical images (separate from the diagnostic AI models’ training data), ii) the diagnoses made by clinicians on these images and iii) the ground truth disease labels corresponding to those images.
Our extensive analysis shows that the selective deferral (SD) system exceeds the performance of either clinicians or AI alone in multiple clinical settings: breast and lung cancer screening. For breast cancer screening, double-reading with arbitration (two readers interpreting each mammogram invoking an arbitrator if needed) is a “gold standard” for performance, never previously exceeded using AI6. The SD system exceeds the accuracy of double-reading with arbitration in a large representative UK screening program (25% reduction in false positives despite equivalent true-positive detection and 66% reduction in the requirement for clinicians to read an image), as well as exceeding the performance of a standalone state-of-art AI system (40% reduction in false positives with equivalent detection of true positives). In a large US dataset the SD system exceeds the accuracy of single-reading by board-certified radiologists and a standalone state-of-art AI system (32% reduction in false positives despite equivalent detection of true positives and 55% reduction in the clinician workload required). The SD system further outperforms both clinical experts alone, and AI alone for the detection of lung cancer in low-dose Computed Tomography images from a large national screening study, with 11% reduction in false positives while maintaining sensitivity given 93% reduction in clinician workload required.  Furthermore, the SD system allows controllable trade-offs between sensitivity and specificity and can be tuned to target either specificity or sensitivity as desired for a particular clinical application, or a combination of both.
The system generalizes to multiple distribution shifts, retaining superiority to both the AI system alone and human experts alone. We demonstrate that the SD system retains performance gains even on clinicians not present in the training data for the deferral AI. Furthermore, we test the SD system on a new population where the standalone AI system’s performance significantly degrades. We showcase the few-shot adaptation capability of the SD system by demonstrating that the SD system can obtain superiority to both the standalone AI system and the clinician on the new population after being trained on only 40 cases from the new population. 
Our comprehensive assessment demonstrates that a selective deferral system could significantly improve clinical outcomes in multiple medical imaging applications, paving the way for higher performance clinical AI systems that can leverage the complementarity between clinical experts and medical AI tools.
              
  
View details
          
        
      
    
        
          
            
              Consensus, dissensus and synergy between clinicians and specialist foundation models in radiology report generation
            
          
        
        
          
            
              
                
                  
                    
    
    
    
    
    
                      
                        Ryutaro Tanno
                      
                    
                
              
            
              
                
                  
                    
                    
                      
                        David Barrett
                      
                    
                  
              
            
              
                
                  
                    
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Sumedh Ghaisas
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Sumanth Dathathri
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Abi See
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Johannes Welbl
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Karan Singhal
                      
                    
                  
              
            
              
                
                  
                    
                    
                  
              
            
              
                
                  
                    
                    
                  
              
            
              
                
                  
                    
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Rhys May
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Roy Lee
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        SiWai Man
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Zahra Ahmed
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Sara Mahdavi
                      
                    
                  
              
            
              
                
                  
                    
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Joelle Barral
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Ali Eslami
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Danielle Belgrave
                      
                    
                  
              
            
              
                
                  
                    
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Shravya Shetty
                      
                    
                  
              
            
              
                
                  
                    
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Po-Sen Huang
                      
                    
                  
              
            
              
                
                  
                    
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Ira Ktena
                      
                    
                  
              
            
          
          
          
          
            Arxiv (2023)
          
          
        
        
        
          
              Preview abstract
          
          
              Radiology reports are an instrumental part of modern medicine, informing key clinical decisions such as diagnosis and treatment. The worldwide shortage of radiologists, however, restricts access to expert care and imposes heavy workloads, contributing to avoidable errors and delays in report delivery. While recent progress in automated report generation with vision-language models offer clear potential in ameliorating the situation, the path to real-world adoption has been stymied by the challenge of evaluating the clinical quality of AI-generated reports. In this study, we build a state-of-the-art report generation system for chest radiographs, Flamingo-CXR, by fine-tuning a well-known vision-language foundation model on radiology data. To evaluate the quality of the AI-generated reports, a group of 16 certified radiologists provide detailed evaluations of AI-generated and human written reports for chest X-rays from an intensive care setting in the United States and an inpatient setting in India. At least one radiologist (out of two per case) preferred the AI report to the ground truth report in over 60% of cases for both datasets. Amongst the subset of AI-generated reports that contain errors, the most frequently cited reasons were related to the location and finding, whereas for human written reports, most mistakes were related to severity and finding. This disparity suggested potential complementarity between our AI system and human experts, prompting us to develop an assistive scenario in which Flamingo-CXR generates a first-draft report, which is subsequently revised by a clinician. This is the first demonstration of clinician-AI collaboration for report writing, and the resultant reports are assessed to be equivalent or preferred by at least one radiologist to reports written by experts alone in 80% of in-patient cases and 60% of intensive care cases.
              
  
View details
          
        
      
    
        
          
            
              Towards Accurate Differential Diagnosis with Large Language Models
            
          
        
        
          
            
              
                
                  
                    
    
    
    
    
    
                      
                        Daniel McDuff
                      
                    
                
              
            
              
                
                  
                    
                    
                  
              
            
              
                
                  
                    
                    
                  
              
            
              
                
                  
                    
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Amy Wang
                      
                    
                  
              
            
              
                
                  
                    
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Karan Singhal
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Yash Sharma
                      
                    
                  
              
            
              
                
                  
                    
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Kavita Kulkarni
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Le Hou
                      
                    
                  
              
            
              
                
                  
                    
                    
                  
              
            
              
                
                  
                    
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Sara Mahdavi
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Sushant Prakash
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Anupam Pathak
                      
                    
                  
              
            
              
                
                  
                    
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Shwetak Patel
                      
                    
                  
              
            
              
                
                  
                    
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Ewa Dominowska
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Juro Gottweis
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Joelle Barral
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Kat Chou
                      
                    
                  
              
            
              
                
                  
                    
                    
                  
              
            
              
                
                  
                    
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Jake Sunshine
                      
                    
                  
              
            
              
                
                  
                    
                    
                  
              
            
              
                
                  
                    
                    
                  
              
            
          
          
          
          
            Arxiv (2023)
          
          
        
        
        
          
              Preview abstract
          
          
              An accurate differential diagnosis (DDx) is a cornerstone of medical care, often reached through an iterative process of interpretation that combines clinical history, physical examination, investigations and procedures. Interactive interfaces powered by Large Language Models (LLMs) present new opportunities to both assist and automate aspects of this process. In this study, we introduce an LLM optimized for diagnostic reasoning, and evaluate its ability to generate a DDx alone or as an aid to clinicians. 20 clinicians evaluated 302 challenging, real-world medical cases sourced from the New England Journal of Medicine (NEJM) case reports. Each case report was read by two clinicians, who were randomized to one of two assistive conditions: either assistance from search engines and standard medical resources, or LLM assistance in addition to these tools. All clinicians provided a baseline, unassisted DDx prior to using the respective assistive tools. Our LLM for DDx exhibited standalone performance that exceeded that of unassisted clinicians (top-10 accuracy 59.1% vs 33.6%, [p = 0.04]). Comparing the two assisted study arms, the DDx quality score was higher for clinicians assisted by our LLM (top-10 accuracy 51.7%) compared to clinicians without its assistance (36.1%) (McNemar's Test: 45.7, p < 0.01) and clinicians with search (44.4%) (4.75, p = 0.03). Further, clinicians assisted by our LLM arrived at more comprehensive differential lists than those without its assistance. Our study suggests that our LLM for DDx has potential to improve clinicians' diagnostic reasoning and accuracy in challenging cases, meriting further real-world evaluation for its ability to empower physicians and widen patients' access to specialist-level expertise.
              
  
View details