Towards Generalist Biomedical AI

Tao Tu

Shek Azizi

Danny Driess

Mike Schaekermann

Mohamed Amin

Pi-Chuan Chang

Andrew Carroll

Chuck Lau

Ryutaro Tanno

Ira Ktena

Anil Palepu

Basil Mustafa

Aakanksha Chowdhery

Yun Liu

Simon Kornblith

David Fleet

Philip Mansfield

Sushant Prakash

Renee Wong

Sunny Virmani

Christopher Semturs

Sara Mahdavi

Bradley Green

Ewa Dominowska

Blaise Aguera-Arcas

Joelle Barral

Dale Webster

Greg Corrado

Yossi Matias

Karan Singhal

Pete Florence

Alan Karthikesalingam

Vivek Natarajan

NEJM AI (2024)

Download Google Scholar

Abstract

BACKGROUND: Medicine is inherently multimodal, requiring the simultaneous interpretation and integration of insights between many data modalities spanning text, imaging, genomics, and more. Generalist biomedical artificial intelligence systems that flexibly encode, integrate, and interpret these data might better enable impactful applications ranging from scientific discovery to care delivery. METHODS: To catalyze development of these models, we curated MultiMedBench, a new multimodal biomedical benchmark. MultiMedBench encompasses 14 diverse tasks, such as medical question answering, mammography and dermatology image interpretation, radiology report generation and summarization, and genomic variant calling. We then introduced Med-PaLM Multimodal (Med-PaLM M), our proof of concept for a generalist biomedical AI system that flexibly encodes and interprets biomedical data including clinical language, imaging, and genomics with the same set of model weights. To further probe the capabilities and limitations of Med-PaLM M, we conducted a radiologist evaluation of model-generated (and human) chest x-ray reports. RESULTS: We observed encouraging performance across model scales. Med-PaLM M reached performance competitive with or exceeding the state of the art on all MultiMedBench tasks, often surpassing specialist models by a wide margin. In a side-by-side ranking on 246 retrospective chest x-rays, clinicians expressed a pairwise preference for Med-PaLM Multimodal reports over those produced by radiologists in up to 40.50% of cases, suggesting potential clinical utility. CONCLUSIONS: Although considerable work is needed to validate these models in real-world cases and understand if cross-modality generalization is possible, our results represent a milestone toward the development of generalist biomedical artificial intelligence systems.

Defining the technology of today and tomorrow.

Philosophy

People

Teams

AI/ML Foundations  & Capabilities

Algorithms & Optimization

Computing Paradigms

Responsible Human-Centric Technology

Science & Societal Impact

Projects

Publications

Resources

Shaping the future, together.

Student programs

Faculty programs

Conferences & events

Towards Generalist Biomedical AI

Abstract

Research Areas

Meet the teams driving innovation

Defining the technology of today and tomorrow.

Philosophy

People

Teams

AI/ML Foundations & Capabilities

Algorithms & Optimization

Computing Paradigms

Responsible Human-Centric Technology

Science & Societal Impact

Projects

Publications

Resources

Shaping the future, together.

Student programs

Faculty programs

Conferences & events

Towards Generalist Biomedical AI

Abstract

Research Areas

Meet the teams driving innovation

AI/ML Foundations  & Capabilities