Jump to Content
Wei-Hung Weng

Wei-Hung Weng

Authored Publications
Google Publications
Other Publications
Sort By
  • Title
  • Title, descending
  • Year
  • Year, descending
    Preview abstract Advances in machine learning for health care have brought concerns about bias from the research community; specifically, the introduction, perpetuation, or exacerbation of care disparities. Reinforcing these concerns is the finding that medical images often reveal signals about sensitive attributes in ways that are hard to pinpoint by both algorithms and people. This finding raises a question about how to best design general purpose pretrained embeddings (GPPEs, defined as embeddings meant to support a broad array of use cases) for building downstream models that are free from particular types of bias. The downstream model should be carefully evaluated for bias, and audited and improved as appropriate. However, in our view, well intentioned attempts to prevent the upstream components—GPPEs—from learning sensitive attributes can have unintended consequences on the downstream models. Despite producing a veneer of technical neutrality, the resultant end-to-end system might still be biased or poorly performing. We present reasons, by building on previously published data, to support the reasoning that GPPEs should ideally contain as much information as the original data contain, and highlight the perils of trying to remove sensitive attributes from a GPPE. We also emphasise that downstream prediction models trained for specific tasks and settings, whether developed using GPPEs or not, should be carefully designed and evaluated to avoid bias that makes models vulnerable to issues such as distributional shift. These evaluations should be done by a diverse team, including social scientists, on a diverse cohort representing the full breadth of the patient population for which the final model is intended. View details
    ELIXR: Towards a general purpose X-ray artificial intelligence system through alignment of large language models and radiology vision encoders
    Shawn Xu
    Lin Yang
    Timo Kohlberger
    Martin Ma
    Atilla Kiraly
    Sahar Kazemzadeh
    Zakkai Melamed
    Jungyeon Park
    Patricia MacWilliams
    Chuck Lau
    Christina Chen
    Mozziyar Etemadi
    Sreenivasa Raju Kalidindi
    Kat Chou
    Shravya Shetty
    Daniel Golden
    Rory Pilgrim
    arxiv (2023)
    Preview abstract Our approach, which we call Embeddings for Language/Image-aligned X-Rays, or ELIXR, leverages a language-aligned image encoder combined or grafted onto a fixed LLM, PaLM 2, to perform a broad range of tasks. We train this lightweight adapter architecture using images paired with corresponding free-text radiology reports from the MIMIC-CXR dataset. ELIXR achieved state-of-the-art performance on zero-shot chest X-ray (CXR) classification (mean AUC of 0.850 across 13 findings), data-efficient CXR classification (mean AUCs of 0.893 and 0.898 across five findings (atelectasis, cardiomegaly, consolidation, pleural effusion, and pulmonary edema) for 1% (~2,200 images) and 10% (~22,000 images) training data), and semantic search (0.76 normalized discounted cumulative gain (NDCG) across nineteen queries, including perfect retrieval on twelve of them). Compared to existing data-efficient methods including supervised contrastive learning (SupCon), ELIXR required two orders of magnitude less data to reach similar performance. ELIXR also showed promise on CXR vision-language tasks, demonstrating overall accuracies of 58.7% and 62.5% on visual question answering and report quality assurance tasks, respectively. These results suggest that ELIXR is a robust and versatile approach to CXR AI. View details
    Preview abstract Health-related acoustic signals, such as cough and breathing sounds, are relevant for medical diagnosis and continuous health monitoring. Most existing machine learning approaches for health acoustics are trained and evaluated on specific tasks, limiting their generalizability across various healthcare applications. In this paper, we leverage a self-supervised learning framework, SimCLR with a Slowfast NFNet backbone, for contrastive learning of health acoustics. A crucial aspect of optimizing Slowfast NFNets for this application lies in identifying effective audio augmentations. We conduct an in-depth analysis of various audio augmentation strategies and demonstrate that an appropriate augmentation strategy enhances the performance of the Slowfast NFNet audio encoder across a diverse set of health acoustic tasks. Our findings reveal that when augmentations are combined, they can produce synergistic effects that exceed the benefits seen when each is applied individually. View details
    Preview abstract Class imbalance is a common problem in medical diagnosis, causing a standard classifier to be biased towards majority classes and ignore the importance of the rest. This is especially true for dermatology, a specialty with thousands of skin conditions but many of which rarely occur in the wild. Buoyed by recent advances, we explore meta-learning based few-shot learning approaches in skin condition recognition problem and propose an evaluation setup to fairly assess the real-world impact of such approaches. When compared to conventional class imbalance techniques, we find that the state-of-the-art few-shot learning methods are not as performant, but combining the two approaches using a novel ensemble leads to improvement in all-way classification, especially the rare classes. We conclude that the ensemble can be useful to address the class imbalance problem, yet progress here can further be accelerated by the use of real-world evaluation setups for benchmarking new methods. View details
    Human-centric Metric for Accelerating Pathology Reports Annotation
    Ruibin Ma
    Cameron Chen
    Angela Lin
    Krishna Kumar Gadepalli
    Yuannan Cai
    arXiv (2019)
    Preview abstract Pathology medical reports written by physicians contain useful class information such as the main organ type, disease type, etc. These class information can be used for large-scale statistical analysis or labelling data in other modalities such as pathology slices (images). However, manual classification for a huge number of reports on multiple tasks are very inefficient. Moreover, they are very hard to read for non-professionals. In this paper, we investigate a general-purpose NLP model called BERT on multilabel text classification. We test it on five different classification tasks and achieve good discrimination. More importantly, we evaluate it under practical situation by measuring how much human labor on annotation can be saved and the performance on automatically classified cases. View details
    Preview abstract Metadata are general characteristics of the data in a well-curated and condensed format, and have been proven to be useful for decision making, knowledge discovery, and also heterogeneous data organization of biobank. Among all data types in the biobank, pathology is the key component of the biobank and also serves as the gold standard of diagnosis. To maximize the capability of biobank and allow the rapid progress of biomedical science, utilizing the pathology metadata is essential yet require enormous expert effort to annotate due to the unstructured nature and complexity of pathology information. In the study, we develop a multimodal multitask learning framework that learns generalizable representations of pathology data to predict four major biobank metadata of the pathology images. We demonstrate that incorporating multimodal information, such as texts and case-level categorical data, improves the metadata prediction performance while multiple downstream tasks are considered simultaneously. Such pathology metadata prediction system may be adopted to mitigate the expert effort of manual annotation and ultimately accelerate the data-driven research by better utilization of the pathology biobank. View details
    No Results Found