Jump to Content

Health & Bioscience

Research in health and biomedical sciences has a unique potential to improve peoples’ lives, and includes work ranging from basic science that aims to understand biology, to diagnosing individuals’ diseases, to epidemiological studies of whole populations. We recognize that our strengths in machine learning, large-scale computing, and human-computer interaction can help accelerate the progress of research in this space. By collaborating with world-class institutions and researchers and engaging in both early-stage research and late-stage work, we hope to help people live healthier, longer, and more productive lives.

Recent Publications

Validation of a deep learning system for the detection of diabetic retinopathy in Indigenous Australians
Mark Chia
Fred Hersch
Pearse Keane
Angus Turner
British Journal of Ophthalmology, vol. 108 (2024), pp. 268-273
Preview abstract Background/aims: Deep learning systems (DLSs) for diabetic retinopathy (DR) detection show promising results but can underperform in racial and ethnic minority groups, therefore external validation within these populations is critical for health equity. This study evaluates the performance of a DLS for DR detection among Indigenous Australians, an understudied ethnic group who suffer disproportionately from DR-related blindness. Methods: We performed a retrospective external validation study comparing the performance of a DLS against a retinal specialist for the detection of more-than-mild DR (mtmDR), vision-threatening DR (vtDR) and all-cause referable DR. The validation set consisted of 1682 consecutive, single-field, macula-centred retinal photographs from 864 patients with diabetes (mean age 54.9 years, 52.4% women) at an Indigenous primary care service in Perth, Australia. Three-person adjudication by a panel of specialists served as the reference standard. Results: For mtmDR detection, sensitivity of the DLS was superior to the retina specialist (98.0% (95% CI, 96.5 to 99.4) vs 87.1% (95% CI, 83.6 to 90.6), McNemar’s test p<0.001) with a small reduction in specificity (95.1% (95% CI, 93.6 to 96.4) vs 97.0% (95% CI, 95.9 to 98.0), p=0.006). For vtDR, the DLS’s sensitivity was again superior to the human grader (96.2% (95% CI, 93.4 to 98.6) vs 84.4% (95% CI, 79.7 to 89.2), p<0.001) with a slight drop in specificity (95.8% (95% CI, 94.6 to 96.9) vs 97.8% (95% CI, 96.9 to 98.6), p=0.002). For all-cause referable DR, there was a substantial increase in sensitivity (93.7% (95% CI, 91.8 to 95.5) vs 74.4% (95% CI, 71.1 to 77.5), p<0.001) and a smaller reduction in specificity (91.7% (95% CI, 90.0 to 93.3) vs 96.3% (95% CI, 95.2 to 97.4), p<0.001). Conclusion: The DLS showed improved sensitivity and similar specificity compared with a retina specialist for DR detection. This demonstrates its potential to support DR screening among Indigenous Australians, an underserved population with a high burden of diabetic eye disease. View details
Preview abstract Importance: Interest in artificial intelligence (AI) has reached an all-time high, and health care leaders across the ecosystem are faced with questions about where, when, and how to deploy AI and how to understand its risks, problems, and possibilities. Observations: While AI as a concept has existed since the 1950s, all AI is not the same. Capabilities and risks of various kinds of AI differ markedly, and on examination 3 epochs of AI emerge. AI 1.0 includes symbolic AI, which attempts to encode human knowledge into computational rules, as well as probabilistic models. The era of AI 2.0 began with deep learning, in which models learn from examples labeled with ground truth. This era brought about many advances both in people’s daily lives and in health care. Deep learning models are task-specific, meaning they do one thing at a time, and they primarily focus on classification and prediction. AI 3.0 is the era of foundation models and generative AI. Models in AI 3.0 have fundamentally new (and potentially transformative) capabilities, as well as new kinds of risks, such as hallucinations. These models can do many different kinds of tasks without being retrained on a new dataset. For example, a simple text instruction will change the model’s behavior. Prompts such as “Write this note for a specialist consultant” and “Write this note for the patient’s mother” will produce markedly different content. Conclusions and Relevance: Foundation models and generative AI represent a major revolution in AI’s capabilities, ffering tremendous potential to improve care. Health care leaders are making decisions about AI today. While any heuristic omits details and loses nuance, the framework of AI 1.0, 2.0, and 3.0 may be helpful to decision-makers because each epoch has fundamentally different capabilities and risks. View details
Preview abstract Advances in machine learning for health care have brought concerns about bias from the research community; specifically, the introduction, perpetuation, or exacerbation of care disparities. Reinforcing these concerns is the finding that medical images often reveal signals about sensitive attributes in ways that are hard to pinpoint by both algorithms and people. This finding raises a question about how to best design general purpose pretrained embeddings (GPPEs, defined as embeddings meant to support a broad array of use cases) for building downstream models that are free from particular types of bias. The downstream model should be carefully evaluated for bias, and audited and improved as appropriate. However, in our view, well intentioned attempts to prevent the upstream components—GPPEs—from learning sensitive attributes can have unintended consequences on the downstream models. Despite producing a veneer of technical neutrality, the resultant end-to-end system might still be biased or poorly performing. We present reasons, by building on previously published data, to support the reasoning that GPPEs should ideally contain as much information as the original data contain, and highlight the perils of trying to remove sensitive attributes from a GPPE. We also emphasise that downstream prediction models trained for specific tasks and settings, whether developed using GPPEs or not, should be carefully designed and evaluated to avoid bias that makes models vulnerable to issues such as distributional shift. These evaluations should be done by a diverse team, including social scientists, on a diverse cohort representing the full breadth of the patient population for which the final model is intended. View details
Mindful Breathing as an Effective Technique in the Management of Hypertension
Aravind Natarajan
Hulya Emir-Farinas
Hao-Wei Su
Frontiers in Physiology, vol. N/A (2024), N/A
Preview abstract Introduction: Hypertension is one of the most important, modifiable risk factors for cardiovascular disease. The popularity of wearable devices provides an opportunity to test whether device guided slow mindful breathing may serve as a non-pharmacological treatment in the management of hypertension. Methods: Fitbit Versa-3 and Sense devices were used for this study. In addition, participants were required to own an FDA or Health Canada approved blood pressure measuring device. Advertisements were shown to 655,910 Fitbit users, of which 7,365 individuals expressed interest and filled out the initial survey. A total of 1,918 participants entered their blood pressure readings on at least 1 day and were considered enrolled in the study. Participants were instructed to download a guided mindful breathing app on their smartwatch device, and to engage with the app once a day prior to sleep. Participants measured their systolic and diastolic blood pressure prior to starting each mindful breathing session, and again after completion. All measurements were self reported. Participants were located in the United States or Canada. Results: Values of systolic and diastolic blood pressure were reduced following mindful breathing. There was also a decrease in resting systolic and diastolic measurements when measured over several days. For participants with a systolic pressure ≥ 130 mmHg, there was a decrease of 9.7 mmHg following 15 min of mindful breathing at 6 breaths per minute. When measured over several days, the resting systolic pressure decreased by an average of 4.3 mmHg. Discussion: Mindful breathing for 15 min a day, at a rate of 6 breaths per minute is effective in lowering blood pressure, and has both an immediate, and a short term effect (over several days). This large scale study demonstrates that device guided mindful breathing with a consumer wearable for 15 min a day is effective in lowering blood pressure, and a helpful complement to the standard of care. View details
Artificial Intelligence in Healthcare: A Perspective from Google
Lily Peng
Lisa Lehmann
Artificial Intelligence in Healthcare, Elsevier (2024)
Preview abstract Artificial Intelligence (AI) holds the promise of transforming healthcare by improving patient outcomes, increasing accessibility and efficiency, and decreasing the cost of care. Realizing this vision of a healthier world for everyone everywhere requires partnerships and trust between healthcare systems, clinicians, payers, technology companies, pharmaceutical companies, and governments to drive innovations in machine learning and artificial intelligence to patients. Google is one example of a technology company that is partnering with healthcare systems, clinicians, and researchers to develop technology solutions that will directly improve the lives of patients. In this chapter we share landmark trials of the use of AI in healthcare. We also describe the application of our novel system of organizing information to unify data in electronic health records (EHRs) and bring an integrated view of patient records to clinicians. We discuss our consumer focused innovation in dermatology to help guide search journeys for personalized information about skin conditions. Finally, we share a perspective on how to embed ethics and a concern for all patients into the development of AI. View details
Differences between Patient and Clinician Submitted Images: Implications for Virtual Care of Skin Conditions
Grace Eunhae Hong
Margaret Ann Smith
Aaron Loh
Vijaytha Muralidharan
Doris Wong
Michelle Phung
Nicolas Betancourt
Bradley Fong
Rachna Sahasrabudhe
Khoban Nasim
Alec Eschholz
Kat Chou
Peggy Bui
Justin Ko
Steven Lin
Mayo Clinic Proceedings: Digital Health (2024)
Preview abstract Objective: To understand and highlight the differences in clinical, demographic, and image quality characteristics between patient-taken (PAT) and clinic-taken (CLIN) photographs of skin conditions. Patients and Methods: This retrospective study applied logistic regression to data from 2500 deidentified cases in Stanford Health Care’s eConsult system, from November 2015 to January 2021. Cases with undiagnosable or multiple conditions or cases with both patient and clinician image sources were excluded, leaving 628 PAT cases and 1719 CLIN cases. Demographic characteristic factors, such as age and sex were self-reported, whereas anatomic location, estimated skin type, clinical signs and symptoms, condition duration, and condition frequency were summarized from patient health records. Image quality variables such as blur, lighting issues and whether the image contained skin, hair, or nails were estimated through a deep learning model. Results: Factors that were positively associated with CLIN photographs, post-2020 were as follows: age 60 years or older, darker skin types (eFST V/VI), and presence of skin growths. By contrast, factors that were positively associated with PAT photographs include conditions appearing intermittently, cases with blurry photographs, photographs with substantial nonskin (or nail/hair) regions and cases with more than 3 photographs. Within the PAT cohort, older age was associated with blurry photographs. Conclusion: There are various demographic, clinical, and image quality characteristic differences between PAT and CLIN photographs of skin concerns. The demographic characteristic differences present important considerations for improving digital literacy or access, whereas the image quality differences point to the need for improved patient education and better image capture workflows, particularly among elderly patients. View details