Abhijit Guha Roy
Abhijit is a Research Engineer at Google Health London. His current research focus is towards AI safety for Medical applications. Prior to joining Google, he did his PhD in Medical Image segmentation at Ludwig Maximilian University of Munich, Germany.
Research Areas
Authored Publications
Sort By
Generative models improve fairness of medical classifiers under distribution shifts
Ira Ktena
Olivia Wiles
Isabela Albuquerque
Sylvestre-Alvise Rebuffi
Ryutaro Tanno
Danielle Belgrave
Taylan Cemgil
Nature Medicine (2024)
Preview abstract
Domain generalization is a ubiquitous challenge for machine learning in healthcare. Model performance in real-world conditions might be lower than expected because of discrepancies between the data encountered during deployment and development. Underrepresentation of some groups or conditions during model development is a common cause of this phenomenon. This challenge is often not readily addressed by targeted data acquisition and ‘labeling’ by expert clinicians, which can be prohibitively expensive or practically impossible because of the rarity of conditions or the available clinical expertise. We hypothesize that advances in generative artificial intelligence can help mitigate this unmet need in a steerable fashion, enriching our training dataset with synthetic examples that address shortfalls of underrepresented conditions or subgroups. We show that diffusion models can automatically learn realistic augmentations from data in a label-efficient manner. We demonstrate that learned augmentations make models more robust and statistically fair in-distribution and out of distribution. To evaluate the generality of our approach, we studied three distinct medical imaging contexts of varying difficulty: (1) histopathology, (2) chest X-ray and (3) dermatology images. Complementing real samples with synthetic ones improved the robustness of models in all three medical tasks and increased fairness by improving the accuracy of clinical diagnosis within underrepresented groups, especially out of distribution.
View details
Robust and data-efficient generalization of self-supervised machine learning for diagnostic imaging
Laura Anne Culp
Jan Freyberg
Basil Mustafa
Sebastien Baur
Simon Kornblith
Ting Chen
Patricia MacWilliams
Sara Mahdavi
Megan Zoë Walker
Aaron Loh
Cameron Chen
Scott Mayer McKinney
Jim Winkens
Zach William Beaver
Fiona Keleher Ryan
Mozziyar Etemadi
Umesh Telang
Lily Hao Yi Peng
Geoffrey Everest Hinton
Neil Houlsby
Mohammad Norouzi
Nature Biomedical Engineering (2023)
Preview abstract
Machine-learning models for medical tasks can match or surpass the performance of clinical experts. However, in settings differing from those of the training dataset, the performance of a model can deteriorate substantially. Here we report a representation-learning strategy for machine-learning models applied to medical-imaging tasks that mitigates such ‘out of distribution’ performance problem and that improves model robustness and training efficiency. The strategy, which we named REMEDIS (for ‘Robust and Efficient Medical Imaging with Self-supervision’), combines large-scale supervised transfer learning on natural images and intermediate contrastive self-supervised learning on medical images and requires minimal task-specific customization. We show the utility of REMEDIS in a range of diagnostic-imaging tasks covering six imaging domains and 15 test datasets, and by simulating three realistic out-of-distribution scenarios. REMEDIS improved in-distribution diagnostic accuracies up to 11.5% with respect to strong supervised baseline models, and in out-of-distribution settings required only 1–33% of the data for retraining to match the performance of supervised models retrained using all available data. REMEDIS may accelerate the development lifecycle of machine-learning models for medical imaging.
View details
Conformal prediction under ambiguous ground truth
David Stutz
Tatiana Matejovicova
Patricia Strachan
Taylan Cemgil
Arnaud Doucet
TMLR (2023)
Preview abstract
In safety-critical classification tasks, conformal prediction allows to perform rigorous uncertainty quan tification by providing confidence sets including the true class with a user-specified probability. This generally assumes the availability of a held-out calibration set with access to ground truth labels. Unfor tunately, in many domains, such labels are difficult to obtain and usually approximated by aggregating some expert opinions. In fact, this holds true for almost all datasets, including well-known ones such as CIFAR and ImageNet. When expert opinions are not resolvable, there is inherent ambiguity present in the labels. That is, we do not have “crisp”, definitive ground truth labels and this uncertainty should be taken into account during calibration. In this paper, we develop a conformal prediction framework for such settings with ambiguous ground truth which relies on an approximation of the underlying posterior distribution of labels given inputs. We demonstrate our methodology on synthetic and real datasets, including a case study of skin condition classification in dermatology.
View details
Does Your Dermatology Classifier Know What It Doesn't Know? Detecting the Long-Tail of Unseen Conditions
Aaron Loh
Basil Mustafa
Nick Pawlowski
Jan Freyberg
Zach William Beaver
Nam Vo
Peggy Bui
Samantha Winter
Patricia MacWilliams
Umesh Telang
Taylan Cemgil
Jim Winkens
Medical Imaging Analysis (2021)
Preview abstract
Supervised deep learning models have proven to be highly effective in classification of dermatological conditions. These models rely on the availability of abundant labeled training examples. However, in the real world, many dermatological conditions are individually too infrequent for per-condition classification with supervised learning. Although individually infrequent, these conditions may collectively be common and therefore are clinically significant in aggregate. To avoid models generating erroneous outputs on such examples, there remains a considerable unmet need for deep learning systems that can better detect such infrequent conditions. These infrequent `outlier' conditions are seen very rarely (or not at all) during training. In this paper, we frame this task as an out-of-distribution (OOD) detection problem. We set up a benchmark ensuring that outlier conditions are disjoint between model train, validation, and test sets. Unlike most traditional OOD benchmarks which detect dataset distribution shift, we aim at detecting semantic differences, often referred to as near-OOD detection which is a more difficult task. We propose a novel hierarchical outlier detection (HOD) approach, which assigns multiple abstention classes for each training outlier class and jointly performs a coarse classification of inliers \vs{} outliers, along with fine-grained classification of the individual classes. We demonstrate that the proposed HOD outperforms existing techniques for outlier exposure based OOD detection. We also use different state-of-the-art representation learning approaches (BiT-JFT, SimCLR, MICLe) to improve OOD performance and demonstrate the effectiveness of HOD loss for them.
Further, we explore different ensembling strategies for OOD detection and propose a diverse ensemble selection process for the best result. We also performed a subgroup analysis over conditions of varying risk levels and different skin types to investigate how OOD performance changes over each subgroup and demonstrated the gains of our framework in comparison to baselines. Furthermore, we go beyond traditional performance metrics and introduce a cost metric to approximate downstream clinical impact. We used this cost metric to compare the proposed method against the baseline, thereby making a stronger case for its effectiveness in real-world deployment scenarios.
View details