Naama Hammel
Naama is a clinical research scientist in Google Health. In this role she focuses on developing and validating machine learning for medical applications across multiple fields including ophthalmology, dermatology, and more. Naama is an ophthalmologist with a subspecialty in glaucoma. She completed her medical and ophthalmology training at Tel-Aviv University; her glaucoma fellowship at the Shiley Eye Institute, UC San Diego; and her ophthalmic informatics fellowship at the UC Davis Eye Center.
Research Areas
Authored Publications
Sort By
Conversational AI in health: Design considerations from a Wizard-of-Oz dermatology case study with users, clinicians and a medical LLM
Brenna Li
Amy Wang
Patricia Strachan
Julie Anne Seguin
Sami Lachgar
Karyn Schroeder
Renee Wong
Extended Abstracts of the 2024 CHI Conference on Human Factors in Computing Systems, Association for Computing Machinery, pp. 10
Preview abstract
Although skin concerns are common, access to specialist care is limited. Artificial intelligence (AI)-assisted tools to support medical decisions may provide patients with feedback on their concerns while also helping ensure the most urgent cases are routed to dermatologists. Although AI-based conversational agents have been explored recently, how they are perceived by patients and clinicians is not well understood. We conducted a Wizard-of-Oz study involving 18 participants with real skin concerns. Participants were randomly assigned to interact with either a clinician agent (portrayed by a dermatologist) or an LLM agent (supervised by a dermatologist) via synchronous multimodal chat. In both conditions, participants found the conversation to be helpful in understanding their medical situation and alleviate their concerns. Through qualitative coding of the conversation transcripts, we provide insight on the importance of empathy and effective information-seeking. We conclude with design considerations for future AI-based conversational agents in healthcare settings.
View details
A deep learning model for novel systemic biomarkers in photographs of the external eye: a retrospective study
Ilana Traynis
Christina Chen
Akib Uddin
Jorge Cuadros
Lauren P. Daskivich
April Y. Maa
Ramasamy Kim
Eugene Yu-Chuan Kang
Lily Peng
Avinash Varadarajan
The Lancet Digital Health (2023)
Preview abstract
Background
Photographs of the external eye were recently shown to reveal signs of diabetic retinal disease and elevated glycated haemoglobin. This study aimed to test the hypothesis that external eye photographs contain information about additional systemic medical conditions.
Methods
We developed a deep learning system (DLS) that takes external eye photographs as input and predicts systemic parameters, such as those related to the liver (albumin, aspartate aminotransferase [AST]); kidney (estimated glomerular filtration rate [eGFR], urine albumin-to-creatinine ratio [ACR]); bone or mineral (calcium); thyroid (thyroid stimulating hormone); and blood (haemoglobin, white blood cells [WBC], platelets). This DLS was trained using 123 130 images from 38 398 patients with diabetes undergoing diabetic eye screening in 11 sites across Los Angeles county, CA, USA. Evaluation focused on nine prespecified systemic parameters and leveraged three validation sets (A, B, C) spanning 25 510 patients with and without diabetes undergoing eye screening in three independent sites in Los Angeles county, CA, and the greater Atlanta area, GA, USA. We compared performance against baseline models incorporating available clinicodemographic variables (eg, age, sex, race and ethnicity, years with diabetes).
Findings
Relative to the baseline, the DLS achieved statistically significant superior performance at detecting AST >36.0 U/L, calcium <8.6 mg/dL, eGFR <60.0 mL/min/1.73 m2, haemoglobin <11.0 g/dL, platelets <150.0 × 103/μL, ACR ≥300 mg/g, and WBC <4.0 × 103/μL on validation set A (a population resembling the development datasets), with the area under the receiver operating characteristic curve (AUC) of the DLS exceeding that of the baseline by 5.3–19.9% (absolute differences in AUC). On validation sets B and C, with substantial patient population differences compared with the development datasets, the DLS outperformed the baseline for ACR ≥300.0 mg/g and haemoglobin <11.0 g/dL by 7.3–13.2%.
Interpretation
We found further evidence that external eye photographs contain biomarkers spanning multiple organ systems. Such biomarkers could enable accessible and non-invasive screening of disease. Further work is needed to understand the translational implications.
View details
Beyond Predictions: Explainability and Learning from Machine Learning
Chih-Ying Deng
Akinori Mitani
Christina Chen
Lily Peng
Digital Eye Care and Teleophthalmology, Springer (2023)
Preview abstract
The intense interest in developing machine learning (ML) models for applications in ophthalmology has produced many potentially useful tools for disease detection, grading, and prognostication. However, though many of these efforts have produced well-validated models, the inner workings of these methods may not be easily understood by many clinicians, patients, and even ML practitioners. In this chapter, we focus on ML model explainability, and begin by first highlighting the utility and importance of explainability before presenting a clinician-accessible explanation of the commonly used methods and the type of insights these methods provide. Next, we present several case studies of ML studies incorporating explainability and describe these studies’ strengths as well as limitations. Finally, we discuss the important work that lies ahead, and how explainability may eventually help push the frontiers of scientific knowledge by enabling human experts to learn from what the machine has learned.
View details
Lessons learned from translating AI from development to deployment in healthcare
Sunny Virmani
Jay Nayar
Elin Rønby Pedersen
Divleen Jeji
Lily Peng
Nature Medicine (2023)
Preview abstract
The application of an artificial intelligence (AI)-based screening tool for retinal disease in India and Thailand highlighted the myths and reality of introducing medical AI, which may form a framework for subsequent tools.
View details
Using generative AI to investigate medical imagery models and datasets
Charles Lau
Chloe Nichols
Doron Yaya-Stupp
Heather Cole-Lewis
Ilana Traynis
https://arxiv.org/ (2022)
Preview abstract
AI models have shown promise in performing many medical imaging tasks. However, our ability to explain what signals these models learn from the training data is severely lacking. Explanations are needed in order to increase the trust of doctors in AI-based models, especially in domains where AI prediction capabilities surpass those of humans. Moreover, such explanations could enable novel scientific discovery by uncovering signals in the data that aren’t yet known to experts. In this paper, we present a method for automatic visual explanations that can help achieve these goals by generating hypotheses of what visual signals in the images are correlated with the task. We propose the following 4 steps: (i) Train a classifier to perform a given task to assess whether the imagery indeed contains signals relevant to the task; (ii) Train a StyleGAN-based image generator with an architecture that enables guidance by the classifier (“StylEx”); (iii) Automatically detect and extract the top visual attributes that the classifier is sensitive to. Each of these attributes can then be independently modified for a set of images to generate counterfactual visualizations of those attributes (i.e. what that image would look like with the attribute increased or decreased); (iv) Present the discovered attributes and corresponding counterfactual visualizations to a multidisciplinary panel of experts to formulate hypotheses for the underlying mechanisms with consideration to social and structural determinants of health (e.g. whether the attributes correspond to known patho-physiological or socio-cultural phenomena, or could be novel discoveries) and stimulate future research. To demonstrate the broad applicability of our approach, we demonstrate results on eight prediction tasks across three medical imaging modalities – retinal fundus photographs, external eye photographs, and chest radiographs. We showcase examples where many of the automatically-learned attributes clearly capture clinically known features (e.g., types of cataract, enlarged heart), and demonstrate automatically-learned confounders that arise from factors beyond physiological mechanisms (e.g., chest X-ray underexposure is correlated with the classifier predicting abnormality, and eye makeup is correlated with the classifier predicting low hemoglobin levels). We further show that our method reveals a number of physiologically plausible novel attributes for future investigation (e.g., differences in the fundus associated with self-reported sex, which were previously unknown). While our approach is not able to discern causal pathways, the ability to generate hypotheses from the attribute visualizations has the potential to enable researchers to better understand, improve their assessment, and extract new knowledge from AI-based models. Importantly, we highlight that attributes generated by our framework can capture phenomena beyond physiology or pathophysiology, reflecting the real world nature of healthcare delivery and socio-cultural factors, and hence multidisciplinary perspectives are critical in these investigations. Finally, we release code to enable researchers to train their own StylEx models and analyze their predictive tasks of interest, and use the methodology presented in this paper for responsible interpretation of the revealed attributes.
View details
Detection of signs of disease in external photographs of the eyes via deep learning
Akinori Mitani
Ilana Traynis
Naho Kitade
April Maa
Jorge Cuadros
Lily Hao Yi Peng
Avinash Vaidyanathan Varadarajan
Nature Biomedical Engineering (2022)
Preview abstract
Retinal fundus photographs can be used to detect a range of retinal conditions. Here we show that deep-learning models trained instead on external photographs of the eyes can be used to detect diabetic retinopathy (DR), diabetic macular oedema and poor blood glucose control. We developed the models using eye photographs from 145,832 patients with diabetes from 301 DR screening sites and evaluated the models on four tasks and four validation datasets with a total of 48,644 patients from 198 additional screening sites. For all four tasks, the predictive performance of the deep-learning models was significantly higher than the performance of logistic regression models using self-reported demographic and medical history data, and the predictions generalized to patients with dilated pupils, to patients from a different DR screening programme and to a general eye care programme that included diabetics and non-diabetics. We also explored the use of the deep-learning models for the detection of elevated lipid levels. The utility of external eye photographs for the diagnosis and management of diseases should be further validated with images from different cameras and patient populations.
View details
Deep learning to detect optical coherence tomography-derived diabetic macular edema from retinal photographs: a multicenter validation study
Xinle Sheila Liu
Tayyeba Ali
Ami Shah
Scott Mayer McKinney
Paisan Ruamviboonsuk
Angus W. Turner
Pearse A. Keane
Peranut Chotcomwongse
Variya Nganthavee
Mark Chia
Josef Huemer
Jorge Cuadros
Rajiv Raman
Lily Hao Yi Peng
Avinash Vaidyanathan Varadarajan
Reena Chopra
Ophthalmology Retina (2022)
Preview abstract
Purpose
To validate the generalizability of a deep learning system (DLS) that detects diabetic macular edema (DME) from two-dimensional color fundus photography (CFP), where the reference standard for retinal
thickness and fluid presence is derived from three-dimensional optical coherence tomography (OCT).
Design
Retrospective validation of a DLS across international datasets.
Participants
Paired CFP and OCT of patients from diabetic retinopathy (DR) screening programs or retina clinics. The DLS was developed using datasets from Thailand, the United Kingdom (UK) and the United States and validated using 3,060 unique eyes from 1,582 patients across screening populations in Australia, India and Thailand. The DLS was separately validated in 698 eyes from 537 screened patients in the UK with mild DR and suspicion of DME based on CFP.
Methods
The DLS was trained using DME labels from OCT. Presence of DME was based on retinal thickening or intraretinal fluid. The DLS’s performance was compared to expert grades of maculopathy and to a previous proof-of-concept version of the DLS. We further simulated integration of the current DLS into an algorithm trained to detect DR from CFPs.
Main Outcome Measures
Superiority of specificity and non-inferiority of sensitivity of the DLS for the detection of center-involving DME, using device specific thresholds, compared to experts.
Results
Primary analysis in a combined dataset spanning Australia, India, and Thailand showed the DLS had 80% specificity and 81% sensitivity compared to expert graders who had 59% specificity and 70% sensitivity. Relative to human experts, the DLS had significantly higher specificity (p=0.008) and non-inferior sensitivity (p<0.001). In the UK dataset the DLS had a specificity of 80% (p<0.001 for specificity > 50%) and a sensitivity of 100% (p=0.02 for sensitivity > 90%).
Conclusions
The DLS can generalize to multiple international populations with an accuracy exceeding experts. The clinical value of this DLS to reduce false positive referrals, thus decreasing the burden on specialist eye care, warrants prospective evaluation.
View details
Retinal fundus photographs capture hemoglobin loss after blood donation
Akinori Mitani
Ilana Traynis
Lily Hao Yi Peng
Avinash Vaidyanathan Varadarajan
medRxiv (2022)
Preview abstract
Recently it was shown that blood hemoglobin concentration could be predicted from retinal fundus photographs by deep learning models. However, it is unclear whether the models were quantifying current blood hemoglobin level, or estimating based on subjects' pretest probability of having anemia. Here, we conducted an observational study with 14 volunteers who donated blood at an on site blood drive held by the local blood center (ie, at which time approximately 10% of their blood was removed). When the deep learning model was applied to retinal fundus photographs taken before and after blood donation, it detected a decrease in blood hemoglobin concentration within each subject at 2-3 days after donation, suggesting that the model was quantifying subacute hemoglobin changes instead of predicting subjects' risk. Additional randomized or controlled studies can further validate this finding.
View details
Large-scale machine learning-based phenotyping significantly improves genomic discovery for optic nerve head morphology
Babak Alipanahi
Babak Behsaz
Zachary Ryan Mccaw
Emanuel Schorsch
Lizzie Dorfman
Sonia Phene
Andrew Walker Carroll
Anthony Khawaja
American Journal of Human Genetics (2021)
Preview abstract
Genome-wide association studies (GWAS) require accurate cohort phenotyping, but expert labeling can be costly, time-intensive, and variable. Here we develop a machine learning (ML) model to predict glaucomatous features from color fundus photographs. We used the model to predict vertical cup-to-disc ratio (VCDR), a diagnostic parameter and cardinal endophenotype for glaucoma, in 65,680 Europeans in the UK Biobank (UKB). A GWAS of ML-based VCDR identified 299 independent genome-wide significant (GWS; P≤5×10-8) hits in 156 loci. The ML-based GWAS replicated 62 of 65 GWS loci from a recent VCDR GWAS in the UKB for which two ophthalmologists manually labeled images for 67,040 Europeans. The ML-based GWAS also identified 93 novel loci, significantly expanding our understanding of the genetic etiologies of glaucoma and VCDR. Pathway analyses support the biological significance of the novel hits to VCDR, with select loci near genes involved in neuronal and synaptic biology or known to cause severe Mendelian ophthalmic disease. Finally, the ML-based GWAS results significantly improve polygenic prediction of VCDR in independent datasets.
View details
Iterative quality control strategies for expert medical image labeling
Sonia Phene
Abigail Huang
Rebecca Ackermann
Olga Kanzheleva
Caitlin Taggart
Proceedings of the AAAI Conference on Human Computation and Crowdsourcing (2021)
Preview abstract
Data quality is a key concern for artificial intelligence (AI) efforts that rely upon crowdsourced data collection. In the domain of medicine in particular, labeled data must meet higher quality standards, or the resulting AI may lead to patient harm, and/or perpetuate biases. What are the challenges involved in expert medical labeling? What processes do such teams employ? In this study, we interviewed members of teams developing AI for medical imaging across 4 subdomains (ophthalmology, radiology, pathology, and dermatology). We identify a set of common practices for ensuring data quality. We describe one instance of low-quality labeling caught by post-launch monitoring. However, the more common pattern is to involve experts in an iterative process of defining, testing, and iterating tasks and instructions. Teams invest in these upstream efforts in order to mitigate downstream quality issues during large-scale labeling.
View details