Skip to main content

Explore our many areas of focus

Explore all research areas

Applied AI & sciences

Sustainability & crisis resilience

Foundational ML & algorithms

Algorithms & theory

Information retrieval

Machine intelligence

Machine perception

Natural language processing

People, systems & quantum AI

Human-computer interaction and visualization

Software engineering

Software systems

Learn More

Building a collaborative ecosystem

Access high-quality datasets to accelerate your research.

Tools & services

Explore our latest AI models and products.

Discover open-source code and collaborate with the community.

Shaping the future together

See all programs

Faculty programs

Participating in the academic research community through meaningful engagement with university faculty.

Student programs

Supporting the next generation of researchers through a wide range of programming.

Find your place in our global offices and research labs.

Translating discovery into real-world impact

Our researchers drive advancements in computer science through both fundamental and applied research.

Collaborative groups tackling the world's most challenging AI problems.

Research

Explore our many areas of focus

Explore all research areas

Applied AI & sciences

Sustainability & crisis resilience

Foundational ML & algorithms

Algorithms & theory

Information retrieval

Machine intelligence

Machine perception

Natural language processing

People, systems & quantum AI

Human-computer interaction and visualization

Software engineering

Software systems

Learn More

Resources

Building a collaborative ecosystem

Access high-quality datasets to accelerate your research.

Tools & services

Explore our latest AI models and products.

Discover open-source code and collaborate with the community.

Conferences & events

Careers

Shaping the future together

See all programs

Faculty programs

Participating in the academic research community through meaningful engagement with university faculty.

Student programs

Supporting the next generation of researchers through a wide range of programming.

Find your place in our global offices and research labs.

Blog

About

Translating discovery into real-world impact

Our researchers drive advancements in computer science through both fundamental and applied research.

Collaborative groups tackling the world's most challenging AI problems.

Google Research

Learn about all our AI

Google DeepMind

Explore the frontier of AI

Try our AI experiments

Conferences & events

Blog

Wei-Hung Weng

Home
People

Wei-Hung Weng

Research Areas

Machine intelligence
Machine perception

Authored Publications

results

Filter by:

Publications

Google 9
Other 0

Years

2026 1
2025 2
2024 2
2023 2
2020 1
2019 2

Research Areas

Data Mining and Modeling 1
Health & Bioscience 4
Machine Intelligence 7
Machine Perception 3

Teams

I-DRIM 2

Sort By

Title
Title, descending
Year
Year, descending

chip template

Towards Conversational AI for Disease Management

Valentin Liévin

Anil Palepu

Wei-Hung Weng

Khaled Saab

David Stutz

Yong Cheng

Kavita Kulkarni

Sara Mahdavi

Joelle Barral

Dale Webster

Avinatan Hassidim

Yossi Matias

James Manyika

Ryutaro Tanno

Vivek Natarajan

Adam Rodman

Tao Tu

Alan Karthikesalingam

Mike Schaekermann

Nature (2026)

Preview abstract While large language models (LLMs) have shown promise in diagnostic dialogue, their capabilities for effective management reasoning - including disease progression, therapeutic response, and safe medication prescription - remain under-explored. We advance the previously demonstrated diagnostic capabilities of the Articulate Medical Intelligence Explorer (AMIE) through a new LLM-based agentic system optimised for clinical management and dialogue, incorporating reasoning over the evolution of disease and multiple patient visit encounters, response to therapy, and professional competence in medication prescription. To ground its reasoning in authoritative clinical knowledge, AMIE leverages Gemini's long-context capabilities, combining in-context retrieval with structured reasoning to align its output with relevant and up-to-date clinical practice guidelines and drug formularies. In a randomized, blinded virtual Objective Structured Clinical Examination (OSCE) study, AMIE was compared to 21 primary care physicians (PCPs) across 100 multi-visit case scenarios designed to reflect UK NICE Guidance and BMJ Best Practice guidelines. AMIE was non-inferior to PCPs in management reasoning as assessed by specialist physicians and scored better in both preciseness of treatments and investigations, and in its alignment with and grounding of management plans in clinical guidelines. To benchmark medication reasoning, we developed RxQA, a multiple-choice question benchmark derived from two national drug formularies (US, UK) and validated by board-certified pharmacists. While AMIE and PCPs both benefited from the ability to access external drug information, AMIE outperformed PCPs on higher difficulty questions. While further research would be needed before real-world translation, AMIE's strong performance across evaluations marks a significant step towards conversational AI as a tool in disease management. View details

AI mirrors experimental science to uncover a novel mechanism of gene transfer crucial to bacterial evolution

Juro Gottweis

Alan Karthikesalingam

Wei-Hung Weng

Anil Palepu

Jose R Penades

Alexander Daryin

Artiom Myaskovsky

Annalisa Pawlosky

Tiago R D Costa

Tao Tu

Cell (2025)

Preview abstract Note this is a re-submission of a previously approved ITP. The previous approval was conditional for a journal pre-sub enquiry only and we are submitting a new ITP for the preprint of the paper. AI models have been proposed for hypothesis generation, but testing their ability to drive high-impact research is challenging, since an AI-generated hypothesis can take decades to validate. Here, we challenge the ability of a recently developed LLM-based platform to generate high-level hypotheses by posing a question that took years to resolve experimentally but remained unpublished: How could capsid-forming phage-inducible chromosomal islands (cf-PICIs) spread across bacterial species? Remarkably, the AI’s top- ranked hypothesis matched our experimentally confirmed mechanism: cf-PICIs hijack diverse phage tails to expand their host range. We critically assess the AI’s five highest- ranked hypotheses, showing that some opened new research avenues in our laboratories. We benchmark its performance against other LLMs and outline best practices for integrating AI into scientific discovery. Our findings suggest that AI can act not just as a computational tool, but as a creative engine, accelerating discovery and reshaping how we generate and test scientific hypotheses. View details

Predicting Cardiovascular Disease Risk using Photoplethysmography and Deep Learning

Wei-Hung Weng

Sebastien Baur

Mayank Daswani

Christina Chen

Lauren Harrell

Sujay Kakarmath

Mariam Jabara

Babak Behsaz

Cory McLean

Yossi Matias

Greg Corrado

Shravya Shetty

Shruthi Prabhakara

Yun Liu

Goodarz Danaei

Diego Ardila

PLOS Global Public Health, 4(6) (2024), e0003204

Preview abstract Cardiovascular diseases (CVDs) are responsible for a large proportion of premature deaths in low- and middle-income countries. Early CVD detection and intervention is critical in these populations, yet many existing CVD risk scores require a physical examination or lab measurements, which can be challenging in such health systems due to limited accessibility. We investigated the potential to use photoplethysmography (PPG), a sensing technology available on most smartphones that can potentially enable large-scale screening at low cost, for CVD risk prediction. We developed a deep learning PPG-based CVD risk score (DLS) to predict the probability of having major adverse cardiovascular events (MACE: non-fatal myocardial infarction, stroke, and cardiovascular death) within ten years, given only age, sex, smoking status and PPG as predictors. We compare the DLS with the office-based refit-WHO score, which adopts the shared predictors from WHO and Globorisk scores (age, sex, smoking status, height, weight and systolic blood pressure) but refitted on the UK Biobank (UKB) cohort. All models were trained on a development dataset (141,509 participants) and evaluated on a geographically separate test (54,856 participants) dataset, both from UKB. DLS’s C-statistic (71.1%, 95% CI 69.9–72.4) is non-inferior to office-based refit-WHO score (70.9%, 95% CI 69.7–72.2; non-inferiority margin of 2.5%, p<0.01) in the test dataset. The calibration of the DLS is satisfactory, with a 1.8% mean absolute calibration error. Adding DLS features to the office-based score increases the C-statistic by 1.0% (95% CI 0.6–1.4). DLS predicts ten-year MACE risk comparable with the office-based refit-WHO score. Interpretability analyses suggest that the DLS-extracted features are related to PPG waveform morphology and are independent of heart rate. Our study provides a proof-of-concept and suggests the potential of a PPG-based approach strategies for community-based primary prevention in resource-limited regions. View details

An intentional approach to managing bias in embedding models

Wei-Hung Weng

Andrew Sellergren

Atilla P. Kiraly

Alexander D'Amour

Jungyeon Park

Rory Pilgrim

Stephen Pfohl

Charles Lau

Vivek Natarajan

Shekoofeh Azizi

Alan Karthikesalingam

Heather Cole-Lewis

Yossi Matias

Greg S. Corrado

Dale R. Webster

Shravya Shetty

Shruthi Prabhakara

Krish Eswaran

Leo Anthony Celi

Yun Liu

The Lancet Digital Health, 6 (2024), E126-E130

Preview abstract Advances in machine learning for health care have brought concerns about bias from the research community; specifically, the introduction, perpetuation, or exacerbation of care disparities. Reinforcing these concerns is the finding that medical images often reveal signals about sensitive attributes in ways that are hard to pinpoint by both algorithms and people. This finding raises a question about how to best design general purpose pretrained embeddings (GPPEs, defined as embeddings meant to support a broad array of use cases) for building downstream models that are free from particular types of bias. The downstream model should be carefully evaluated for bias, and audited and improved as appropriate. However, in our view, well intentioned attempts to prevent the upstream components—GPPEs—from learning sensitive attributes can have unintended consequences on the downstream models. Despite producing a veneer of technical neutrality, the resultant end-to-end system might still be biased or poorly performing. We present reasons, by building on previously published data, to support the reasoning that GPPEs should ideally contain as much information as the original data contain, and highlight the perils of trying to remove sensitive attributes from a GPPE. We also emphasise that downstream prediction models trained for specific tasks and settings, whether developed using GPPEs or not, should be carefully designed and evaluated to avoid bias that makes models vulnerable to issues such as distributional shift. These evaluations should be done by a diverse team, including social scientists, on a diverse cohort representing the full breadth of the patient population for which the final model is intended. View details

Optimizing Audio Augmentations for Contrastive Learning of Health-Related Acoustic Signals

Louis Blankemeier

Sebastien Baur

Wei-Hung Weng

Jake Garrison

Yossi Matias

Shruthi Prabhakara

Diego Ardila

Zaid Nabulsi

arXiv (2023)

Preview abstract Health-related acoustic signals, such as cough and breathing sounds, are relevant for medical diagnosis and continuous health monitoring. Most existing machine learning approaches for health acoustics are trained and evaluated on specific tasks, limiting their generalizability across various healthcare applications. In this paper, we leverage a self-supervised learning framework, SimCLR with a Slowfast NFNet backbone, for contrastive learning of health acoustics. A crucial aspect of optimizing Slowfast NFNets for this application lies in identifying effective audio augmentations. We conduct an in-depth analysis of various audio augmentation strategies and demonstrate that an appropriate augmentation strategy enhances the performance of the Slowfast NFNet audio encoder across a diverse set of health acoustic tasks. Our findings reveal that when augmentations are combined, they can produce synergistic effects that exceed the benefits seen when each is applied individually. View details

ELIXR: Towards a general purpose X-ray artificial intelligence system through alignment of large language models and radiology vision encoders

Shawn Xu

Lin Yang

Christopher Kelly

Marcin Sieniek

Timo Kohlberger

Martin Ma

Wei-Hung Weng

Atilla Kiraly

Sahar Kazemzadeh

Zakkai Melamed

Jungyeon Park

Patricia MacWilliams

Yun Liu

Chuck Lau

Preeti Singh

Christina Chen

Mozziyar Etemadi

Sreenivasa Raju Kalidindi

Kat Chou

Greg Corrado

Shravya Shetty

Daniel Tse

Shruthi Prabhakara

Daniel Golden

Rory Pilgrim

Krish Eswaran

Andrew Sellergren

Yossi Matias

arxiv (2023)

Preview abstract Our approach, which we call Embeddings for Language/Image-aligned X-Rays, or ELIXR, leverages a language-aligned image encoder combined or grafted onto a fixed LLM, PaLM 2, to perform a broad range of tasks. We train this lightweight adapter architecture using images paired with corresponding free-text radiology reports from the MIMIC-CXR dataset. ELIXR achieved state-of-the-art performance on zero-shot chest X-ray (CXR) classification (mean AUC of 0.850 across 13 findings), data-efficient CXR classification (mean AUCs of 0.893 and 0.898 across five findings (atelectasis, cardiomegaly, consolidation, pleural effusion, and pulmonary edema) for 1% (~2,200 images) and 10% (~22,000 images) training data), and semantic search (0.76 normalized discounted cumulative gain (NDCG) across nineteen queries, including perfect retrieval on twelve of them). Compared to existing data-efficient methods including supervised contrastive learning (SupCon), ELIXR required two orders of magnitude less data to reach similar performance. ELIXR also showed promise on CXR vision-language tasks, demonstrating overall accuracies of 58.7% and 62.5% on visual question answering and report quality assurance tasks, respectively. These results suggest that ELIXR is a robust and versatile approach to CXR AI. View details

Addressing the Real-world Class Imbalance Problem in Dermatology

Gamaleldin Fathy Elsayed

Jon Deaton

Vivek Natarajan

Wei-Hung Weng

Yuan Liu

NeurIPS ML4H 2020 (2020)

Preview abstract Class imbalance is a common problem in medical diagnosis, causing a standard classifier to be biased towards majority classes and ignore the importance of the rest. This is especially true for dermatology, a specialty with thousands of skin conditions but many of which rarely occur in the wild. Buoyed by recent advances, we explore meta-learning based few-shot learning approaches in skin condition recognition problem and propose an evaluation setup to fairly assess the real-world impact of such approaches. When compared to conventional class imbalance techniques, we find that the state-of-the-art few-shot learning methods are not as performant, but combining the two approaches using a novel ensemble leads to improvement in all-way classification, especially the rare classes. We conclude that the ensemble can be useful to address the class imbalance problem, yet progress here can further be accelerated by the use of real-world evaluation setups for benchmarking new methods. View details

Multimodal Multitask Representation Learning for Pathology Biobank Metadata Prediction

Wei-Hung Weng

Yuannan Cai

Angela Lin

Fraser Tan

Cameron Chen

(2019)

Preview abstract Metadata are general characteristics of the data in a well-curated and condensed format, and have been proven to be useful for decision making, knowledge discovery, and also heterogeneous data organization of biobank. Among all data types in the biobank, pathology is the key component of the biobank and also serves as the gold standard of diagnosis. To maximize the capability of biobank and allow the rapid progress of biomedical science, utilizing the pathology metadata is essential yet require enormous expert effort to annotate due to the unstructured nature and complexity of pathology information. In the study, we develop a multimodal multitask learning framework that learns generalizable representations of pathology data to predict four major biobank metadata of the pathology images. We demonstrate that incorporating multimodal information, such as texts and case-level categorical data, improves the metadata prediction performance while multiple downstream tasks are considered simultaneously. Such pathology metadata prediction system may be adopted to mitigate the expert effort of manual annotation and ultimately accelerate the data-driven research by better utilization of the pathology biobank. View details

Human-centric Metric for Accelerating Pathology Reports Annotation

Ruibin Ma

Cameron Chen

Gang Li

Wei-Hung Weng

Angela Lin

Krishna Kumar Gadepalli

Yuannan Cai

arXiv (2019)

Preview abstract Pathology medical reports written by physicians contain useful class information such as the main organ type, disease type, etc. These class information can be used for large-scale statistical analysis or labelling data in other modalities such as pathology slices (images). However, manual classification for a huge number of reports on multiple tasks are very inefficient. Moreover, they are very hard to read for non-professionals. In this paper, we investigate a general-purpose NLP model called BERT on multilabel text classification. We test it on five different classification tasks and achieve good discrimination. More importantly, we evaluate it under practical situation by measuring how much human labor on annotation can be saved and the performance on automatically classified cases. View details

Search on Google Scholar

Join us

We're always looking for more talented, passionate people.

See opportunities

Follow us

Explore our other initiatives

Google AI

Discover how Google AI is committed to enriching knowledge and solving complex challenges

Products
Build
Research
Responsibility
Societal Impact
About

Google Cloud

High-performance infrastructure for cloud computing, data analytics & machine learning

Overview
Solutions
Products
Pricing
Resources

Google DeepMind

Our mission is to build AI responsibly to benefit humanity

Models
Research
Science
About

Google Labs

Explore the future of AI responsibly with Google Labs

About
Experiments
Stay connected

Google Products

×