Jump to Content

Boris Babenko

Boris is a software engineer in Google Health, where he works on discovering new imaging biomarkers using machine learning. Prior to Google, Boris worked on satellite image analysis at Orbital Insight, a consumer photo product at Dropbox, and co-founded a data labeling startup. He received a PhD in Computer Science from UC San Diego.
Authored Publications
Google Publications
Other Publications
Sort By
  • Title
  • Title, descending
  • Year
  • Year, descending
    Robust and data-efficient generalization of self-supervised machine learning for diagnostic imaging
    Laura Anne Culp
    Jan Freyberg
    Basil Mustafa
    Sebastien Baur
    Simon Kornblith
    Ting Chen
    Patricia MacWilliams
    Sara Mahdavi
    Megan Zoë Walker
    Aaron Loh
    Cameron Chen
    Scott Mayer McKinney
    Zach William Beaver
    Fiona Keleher Ryan
    Mozziyar Etemadi
    Umesh Telang
    Lily Hao Yi Peng
    Geoffrey Everest Hinton
    Mohammad Norouzi
    Nature Biomedical Engineering (2023)
    Preview abstract Machine-learning models for medical tasks can match or surpass the performance of clinical experts. However, in settings differing from those of the training dataset, the performance of a model can deteriorate substantially. Here we report a representation-learning strategy for machine-learning models applied to medical-imaging tasks that mitigates such ‘out of distribution’ performance problem and that improves model robustness and training efficiency. The strategy, which we named REMEDIS (for ‘Robust and Efficient Medical Imaging with Self-supervision’), combines large-scale supervised transfer learning on natural images and intermediate contrastive self-supervised learning on medical images and requires minimal task-specific customization. We show the utility of REMEDIS in a range of diagnostic-imaging tasks covering six imaging domains and 15 test datasets, and by simulating three realistic out-of-distribution scenarios. REMEDIS improved in-distribution diagnostic accuracies up to 11.5% with respect to strong supervised baseline models, and in out-of-distribution settings required only 1–33% of the data for retraining to match the performance of supervised models retrained using all available data. REMEDIS may accelerate the development lifecycle of machine-learning models for medical imaging. View details
    Longitudinal fundus imaging and its genome-wide association analysis provides evidence for a human retinal aging clock
    Sara Ahadi
    Kenneth A Wilson Jr,
    Orion Pritchard
    Ajay Kumar
    Enrique M Carrera
    Ricardo Lamy
    Jay M Stewart
    Avinash Varadarajan
    Pankaj Kapahi
    Ali Bashir
    eLife (2023)
    Preview abstract Background Biological age, distinct from an individual’s chronological age, has been studied extensively through predictive aging clocks. However, these clocks have limited accuracy in short time-scales. Deep learning approaches on imaging datasets of the eye have proven powerful for a variety of quantitative phenotype inference and provide an opportunity to explore organismal aging and tissue health. Methods Here we trained deep learning models on fundus images from the EyePacs dataset to predict individuals’ chronological age. These predictions lead to the concept of a retinal aging clock which we then employed for a series of downstream longitudinal analyses. The retinal aging clock was used to assess the predictive power of aging inference, termed eyeAge, on short time-scales using longitudinal fundus imaging data from a subset of patients. Additionally, the model was applied to a separate cohort from the UK Biobank to validate the model and perform a GWAS. The top candidate gene was then tested in a fly model of eye aging. Findings EyeAge was able to predict the age with a mean absolute error of 3.26 years, which is much less than other aging clocks. Additionally, eyeAge was highly independent of blood marker-based measures of biological age (e.g. “phenotypic age”), maintaining a hazard ratio of 1.026 even in the presence of phenotypic age. Longitudinal studies showed that the resulting models were able to predict individuals’ aging, in time-scales less than a year with 71% accuracy. Notably, we observed a significant individual-specific component to the prediction. This observation was confirmed with the identification of multiple GWAS hits in the independent UK Biobank cohort. The knockdown of the top hit, ALKAL2, which was previously shown to extend lifespan in flies, also slowed age-related decline in vision in flies. Interpretation In conclusion, predicted age from retinal images can be used as a biomarker of biological aging in a given individual independently from phenotypic age. This study demonstrates the utility of retinal aging clock for studying aging and age-related diseases and quantitatively measuring aging on very short time-scales, potentially opening avenues for quick and actionable evaluation of gero-protective therapeutics. View details
    Discovering novel systemic biomarkers in external eye photos
    Ilana Traynis
    Christina Chen
    Akib Uddin
    Jorge Cuadros
    Lauren P. Daskivich
    April Y. Maa
    Ramasamy Kim
    Eugene Yu-Chuan Kang
    Lily Peng
    Avinash Varadarajan
    The Lancet Digital Health (2023)
    Preview abstract Background Photographs of the external eye were recently shown to reveal signs of diabetic retinal disease and elevated glycated haemoglobin. This study aimed to test the hypothesis that external eye photographs contain information about additional systemic medical conditions. Methods We developed a deep learning system (DLS) that takes external eye photographs as input and predicts systemic parameters, such as those related to the liver (albumin, aspartate aminotransferase [AST]); kidney (estimated glomerular filtration rate [eGFR], urine albumin-to-creatinine ratio [ACR]); bone or mineral (calcium); thyroid (thyroid stimulating hormone); and blood (haemoglobin, white blood cells [WBC], platelets). This DLS was trained using 123 130 images from 38 398 patients with diabetes undergoing diabetic eye screening in 11 sites across Los Angeles county, CA, USA. Evaluation focused on nine prespecified systemic parameters and leveraged three validation sets (A, B, C) spanning 25 510 patients with and without diabetes undergoing eye screening in three independent sites in Los Angeles county, CA, and the greater Atlanta area, GA, USA. We compared performance against baseline models incorporating available clinicodemographic variables (eg, age, sex, race and ethnicity, years with diabetes). Findings Relative to the baseline, the DLS achieved statistically significant superior performance at detecting AST >36.0 U/L, calcium <8.6 mg/dL, eGFR <60.0 mL/min/1.73 m2, haemoglobin <11.0 g/dL, platelets <150.0 × 103/μL, ACR ≥300 mg/g, and WBC <4.0 × 103/μL on validation set A (a population resembling the development datasets), with the area under the receiver operating characteristic curve (AUC) of the DLS exceeding that of the baseline by 5.3–19.9% (absolute differences in AUC). On validation sets B and C, with substantial patient population differences compared with the development datasets, the DLS outperformed the baseline for ACR ≥300.0 mg/g and haemoglobin <11.0 g/dL by 7.3–13.2%. Interpretation We found further evidence that external eye photographs contain biomarkers spanning multiple organ systems. Such biomarkers could enable accessible and non-invasive screening of disease. Further work is needed to understand the translational implications. View details
    Detection of signs of disease in external photographs of the eyes via deep learning
    Akinori Mitani
    Ilana Traynis
    Naho Kitade
    April Maa
    Jorge Cuadros
    Lily Hao Yi Peng
    Avinash Vaidyanathan Varadarajan
    Nature Biomedical Engineering (2022)
    Preview abstract Retinal fundus photographs can be used to detect a range of retinal conditions. Here we show that deep-learning models trained instead on external photographs of the eyes can be used to detect diabetic retinopathy (DR), diabetic macular oedema and poor blood glucose control. We developed the models using eye photographs from 145,832 patients with diabetes from 301 DR screening sites and evaluated the models on four tasks and four validation datasets with a total of 48,644 patients from 198 additional screening sites. For all four tasks, the predictive performance of the deep-learning models was significantly higher than the performance of logistic regression models using self-reported demographic and medical history data, and the predictions generalized to patients with dilated pupils, to patients from a different DR screening programme and to a general eye care programme that included diabetics and non-diabetics. We also explored the use of the deep-learning models for the detection of elevated lipid levels. The utility of external eye photographs for the diagnosis and management of diseases should be further validated with images from different cameras and patient populations. View details
    Preview abstract AI models have shown promise in performing many medical imaging tasks. However, our ability to explain what signals these models learn from the training data is severely lacking. Explanations are needed in order to increase the trust of doctors in AI-based models, especially in domains where AI prediction capabilities surpass those of humans. Moreover, such explanations could enable novel scientific discovery by uncovering signals in the data that aren’t yet known to experts. In this paper, we present a method for automatic visual explanations that can help achieve these goals by generating hypotheses of what visual signals in the images are correlated with the task. We propose the following 4 steps: (i) Train a classifier to perform a given task to assess whether the imagery indeed contains signals relevant to the task; (ii) Train a StyleGAN-based image generator with an architecture that enables guidance by the classifier (“StylEx”); (iii) Automatically detect and extract the top visual attributes that the classifier is sensitive to. Each of these attributes can then be independently modified for a set of images to generate counterfactual visualizations of those attributes (i.e. what that image would look like with the attribute increased or decreased); (iv) Present the discovered attributes and corresponding counterfactual visualizations to a multidisciplinary panel of experts to formulate hypotheses for the underlying mechanisms with consideration to social and structural determinants of health (e.g. whether the attributes correspond to known patho-physiological or socio-cultural phenomena, or could be novel discoveries) and stimulate future research. To demonstrate the broad applicability of our approach, we demonstrate results on eight prediction tasks across three medical imaging modalities – retinal fundus photographs, external eye photographs, and chest radiographs. We showcase examples where many of the automatically-learned attributes clearly capture clinically known features (e.g., types of cataract, enlarged heart), and demonstrate automatically-learned confounders that arise from factors beyond physiological mechanisms (e.g., chest X-ray underexposure is correlated with the classifier predicting abnormality, and eye makeup is correlated with the classifier predicting low hemoglobin levels). We further show that our method reveals a number of physiologically plausible novel attributes for future investigation (e.g., differences in the fundus associated with self-reported sex, which were previously unknown). While our approach is not able to discern causal pathways, the ability to generate hypotheses from the attribute visualizations has the potential to enable researchers to better understand, improve their assessment, and extract new knowledge from AI-based models. Importantly, we highlight that attributes generated by our framework can capture phenomena beyond physiology or pathophysiology, reflecting the real world nature of healthcare delivery and socio-cultural factors, and hence multidisciplinary perspectives are critical in these investigations. Finally, we release code to enable researchers to train their own StylEx models and analyze their predictive tasks of interest, and use the methodology presented in this paper for responsible interpretation of the revealed attributes. View details
    Predicting the risk of developing diabetic retinopathy using deep learning
    Ashish Bora
    Siva Balasubramanian
    Sunny Virmani
    Akinori Mitani
    Guilherme De Oliveira Marinho
    Jorge Cuadros
    Dr. Paisan Raumviboonsuk
    Lily Hao Yi Peng
    Avinash Vaidyanathan Varadarajan
    Lancet Digital Health (2020)
    Preview abstract Background: Diabetic retinopathy screening is instrumental to preventing blindness, but scaling up screening is challenging because of the increasing number of patients with all forms of diabetes. We aimed to create a deep-learning system to predict the risk of patients with diabetes developing diabetic retinopathy within 2 years. Methods: We created and validated two versions of a deep-learning system to predict the development of diabetic retinopathy in patients with diabetes who had had teleretinal diabetic retinopathy screening in a primary care setting. The input for the two versions was either a set of three-field or one-field colour fundus photographs. Of the 575 431 eyes in the development set 28 899 had known outcomes, with the remaining 546 532 eyes used to augment the training process via multitask learning. Validation was done on one eye (selected at random) per patient from two datasets: an internal validation (from EyePACS, a teleretinal screening service in the USA) set of 3678 eyes with known outcomes and an external validation (from Thailand) set of 2345 eyes with known outcomes. Findings: The three-field deep-learning system had an area under the receiver operating characteristic curve (AUC) of 0·79 (95% CI 0·77–0·81) in the internal validation set. Assessment of the external validation set—which contained only one-field colour fundus photographs—with the one-field deep-learning system gave an AUC of 0·70 (0·67–0·74). In the internal validation set, the AUC of available risk factors was 0·72 (0·68–0·76), which improved to 0·81 (0·77–0·84) after combining the deep-learning system with these risk factors (p<0·0001). In the external validation set, the corresponding AUC improved from 0·62 (0·58–0·66) to 0·71 (0·68–0·75; p<0·0001) following the addition of the deep-learning system to available risk factors. Interpretation: The deep-learning systems predicted diabetic retinopathy development using colour fundus photographs, and the systems were independent of and more informative than available risk factors. Such a risk stratification tool might help to optimise screening intervals to reduce costs while improving vision-related outcomes. View details
    Preview abstract Background: Patients with neovascular age-related macular degeneration (AMD) can avoid vision loss via certain therapy. However, methods to predict the progression to neovascular age-related macular degeneration (nvAMD) are lacking. Purpose: To develop and validate a deep learning (DL) algorithm to predict 1-year progression of eyes with no, early, or intermediate AMD to nvAMD, using color fundus photographs (CFP). Design: Development and validation of a DL algorithm. Methods: We trained a DL algorithm to predict 1-year progression to nvAMD, and used 10-fold cross-validation to evaluate this approach on two groups of eyes in the Age-Related Eye Disease Study (AREDS): none/early/intermediate AMD, and intermediate AMD (iAMD) only. We compared the DL algorithm to the manually graded 4-category and 9-step scales in the AREDS dataset. Main outcome measures: Performance of the DL algorithm was evaluated using the sensitivity at 80% specificity for progression to nvAMD. Results: The DL algorithm's sensitivity for predicting progression to nvAMD from none/early/iAMD (78+/-6%) was higher than manual grades from the 9-step scale (67+/-8%) or the 4-category scale (48+/-3%). For predicting progression specifically from iAMD, the DL algorithm's sensitivity (57+/-6%) was also higher compared to the 9-step grades (36+/-8%) and the 4-category grades (20+/-0%). Conclusions: Our DL algorithm performed better in predicting progression to nvAMD than manual grades. Future investigations are required to test the application of this DL algorithm in a real-world clinical setting. View details
    No Results Found