Andrew B. Sellergren
Research Areas
Authored Publications
Sort By
An intentional approach to managing bias in embedding models
Atilla P. Kiraly
Jungyeon Park
Rory Pilgrim
Charles Lau
Heather Cole-Lewis
Shravya Shetty
Krish Eswaran
Leo Anthony Celi
The Lancet Digital Health, 6 (2024), E126-E130
Preview abstract
Advances in machine learning for health care have brought concerns about bias from the research community; specifically, the introduction, perpetuation, or exacerbation of care disparities. Reinforcing these concerns is the finding that medical images often reveal signals about sensitive attributes in ways that are hard to pinpoint by both algorithms and people. This finding raises a question about how to best design general purpose pretrained embeddings (GPPEs, defined as embeddings meant to support a broad array of use cases) for building downstream models that are free from particular types of bias. The downstream model should be carefully evaluated for bias, and audited and improved as appropriate. However, in our view, well intentioned attempts to prevent the upstream components—GPPEs—from learning sensitive attributes can have unintended consequences on the downstream models. Despite producing a veneer of technical neutrality, the resultant end-to-end system might still be biased or poorly performing. We present reasons, by building on previously published data, to support the reasoning that GPPEs should ideally contain as much information as the original data contain, and highlight the perils of trying to remove sensitive attributes from a GPPE. We also emphasise that downstream prediction models trained for specific tasks and settings, whether developed using GPPEs or not, should be carefully designed and evaluated to avoid bias that makes models vulnerable to issues such as distributional shift. These evaluations should be done by a diverse team, including social scientists, on a diverse cohort representing the full breadth of the patient population for which the final model is intended.
View details
ELIXR: Towards a general purpose X-ray artificial intelligence system through alignment of large language models and radiology vision encoders
Shawn Xu
Lin Yang
Christopher Kelly
Timo Kohlberger
Martin Ma
Atilla Kiraly
Sahar Kazemzadeh
Zakkai Melamed
Jungyeon Park
Patricia MacWilliams
Chuck Lau
Christina Chen
Mozziyar Etemadi
Sreenivasa Raju Kalidindi
Kat Chou
Shravya Shetty
Daniel Golden
Rory Pilgrim
Krish Eswaran
arxiv (2023)
Preview abstract
Our approach, which we call Embeddings for Language/Image-aligned X-Rays, or ELIXR, leverages a language-aligned image encoder combined or grafted onto a fixed LLM, PaLM 2, to perform a broad range of tasks. We train this lightweight adapter architecture using images paired with corresponding free-text radiology reports from the MIMIC-CXR dataset. ELIXR achieved state-of-the-art performance on zero-shot chest X-ray (CXR) classification (mean AUC of 0.850 across 13 findings), data-efficient CXR classification (mean AUCs of 0.893 and 0.898 across five findings (atelectasis, cardiomegaly, consolidation, pleural effusion, and pulmonary edema) for 1% (~2,200 images) and 10% (~22,000 images) training data), and semantic search (0.76 normalized discounted cumulative gain (NDCG) across nineteen queries, including perfect retrieval on twelve of them). Compared to existing data-efficient methods including supervised contrastive learning (SupCon), ELIXR required two orders of magnitude less data to reach similar performance. ELIXR also showed promise on CXR vision-language tasks, demonstrating overall accuracies of 58.7% and 62.5% on visual question answering and report quality assurance tasks, respectively. These results suggest that ELIXR is a robust and versatile approach to CXR AI.
View details
Consensus, dissensus and synergy between clinicians and specialist foundation models in radiology report generation
Ryutaro Tanno
David Barrett
Sumedh Ghaisas
Sumanth Dathathri
Abi See
Johannes Welbl
Karan Singhal
Rhys May
Roy Lee
SiWai Man
Zahra Ahmed
Sara Mahdavi
Joelle Barral
Ali Eslami
Danielle Belgrave
Shravya Shetty
Po-Sen Huang
Ira Ktena
Arxiv (2023)
Preview abstract
Radiology reports are an instrumental part of modern medicine, informing key clinical decisions such as diagnosis and treatment. The worldwide shortage of radiologists, however, restricts access to expert care and imposes heavy workloads, contributing to avoidable errors and delays in report delivery. While recent progress in automated report generation with vision-language models offer clear potential in ameliorating the situation, the path to real-world adoption has been stymied by the challenge of evaluating the clinical quality of AI-generated reports. In this study, we build a state-of-the-art report generation system for chest radiographs, Flamingo-CXR, by fine-tuning a well-known vision-language foundation model on radiology data. To evaluate the quality of the AI-generated reports, a group of 16 certified radiologists provide detailed evaluations of AI-generated and human written reports for chest X-rays from an intensive care setting in the United States and an inpatient setting in India. At least one radiologist (out of two per case) preferred the AI report to the ground truth report in over 60% of cases for both datasets. Amongst the subset of AI-generated reports that contain errors, the most frequently cited reasons were related to the location and finding, whereas for human written reports, most mistakes were related to severity and finding. This disparity suggested potential complementarity between our AI system and human experts, prompting us to develop an assistive scenario in which Flamingo-CXR generates a first-draft report, which is subsequently revised by a clinician. This is the first demonstration of clinician-AI collaboration for report writing, and the resultant reports are assessed to be equivalent or preferred by at least one radiologist to reports written by experts alone in 80% of in-patient cases and 60% of intensive care cases.
View details
Simplified Transfer Learning for Chest X-ray Models using Less Data
Christina Chen
AJ Maschinot
Jenny Huang
Chuck Lau
Sreenivasa Raju Kalidindi
Mozziyar Etemadi
Florencia Garcia-Vicente
David Melnick
Krish Eswaran
Neeral Beladia
Dilip Krishnan
Shravya Ramesh Shetty
Radiology (2022)
Preview abstract
Background: Developing deep learning models for radiology requires large data sets and substantial computational resources. Data set size limitations can be further exacerbated by distribution shifts, such as rapid changes in patient populations and standard of care during the COVID-19 pandemic. A common partial mitigation is transfer learning by pretraining a “generic network” on a large nonmedical data set and then fine-tuning on a task-specific radiology data set. Purpose: To reduce data set size requirements for chest radiography deep learning models by using an advanced machine learning approach (supervised contrastive [SupCon] learning) to generate chest radiography networks. Materials and Methods: SupCon helped generate chest radiography networks from 821 544 chest radiographs from India and the United States. The chest radiography networks were used as a starting point for further machine learning model development for 10 prediction tasks (eg, airspace opacity, fracture, tuberculosis, and COVID-19 outcomes) by using five data sets comprising 684 955 chest radiographs from India, the United States, and China. Three model development setups were tested (linear classifier, nonlinear classifier, and fine-tuning the full network) with different data set sizes from eight to 85. Results: Across a majority of tasks, compared with transfer learning from a nonmedical data set, SupCon reduced label requirements up to 688-fold and improved the area under the receiver operating characteristic curve (AUC) at matching data set sizes. At the extreme low-data regimen, training small nonlinear models by using only 45 chest radiographs yielded an AUC of 0.95 (noninferior to radiologist performance) in classifying microbiology-confirmed tuberculosis in external validation. At a more moderate data regimen, training small nonlinear models by using only 528 chest radiographs yielded an AUC of 0.75 in predicting severe COVID-19 outcomes. Conclusion: Supervised contrastive learning enabled performance comparable to state-of-the-art deep learning models in multiple clinical tasks by using as few as 45 images and is a promising method for predictive modeling with use of small data sets and for predicting outcomes in shifting patient populations.
View details
Deep learning for distinguishing normal versus abnormal chest radiographs and generalization to two unseen diseases tuberculosis and COVID-19
Shahar Jamshy
Charles Lau
Eddie Santos
Atilla Peter Kiraly
Jie Yang
Rory Pilgrim
Sahar Kazemzadeh
Jin Yu
Lily Hao Yi Peng
Krish Eswaran
Neeral Beladia
Cameron Chen
Shravya Ramesh Shetty
Scientific Reports (2021)
Preview abstract
Chest radiography (CXR) is the most widely-used thoracic clinical imaging modality and is crucial for guiding the management of cardiothoracic conditions. The detection of specific CXR findings has been the main focus of several artificial intelligence (AI) systems. However, the wide range of possible CXR abnormalities makes it impractical to detect every possible condition by building multiple separate systems, each of which detects one or more pre-specified conditions. In this work, we developed and evaluated an AI system to classify CXRs as normal or abnormal. For training and tuning the system, we used a de-identified dataset of 248,445 patients from a multi-city hospital network in India. To assess generalizability, we evaluated our system using 6 international datasets from India, China, and the United States. Of these datasets, 4 focused on diseases that the AI was not trained to detect: 2 datasets with tuberculosis and 2 datasets with coronavirus disease 2019. Our results suggest that the AI system trained using a large dataset containing a diverse array of CXR abnormalities generalizes to new patient populations and unseen diseases. In a simulated workflow where the AI system prioritized abnormal cases, the turnaround time for abnormal cases reduced by 7–28%. These results represent an important step towards evaluating whether AI can be safely used to flag cases in a general setting where previously unseen abnormalities exist. Lastly, to facilitate the continued development of AI models for CXR, we release our collected labels for the publicly available dataset.
View details