Publications
Our teams aspire to make discoveries that impact everyone, and core to our approach is sharing our research and tools to fuel progress in the field.
Our teams aspire to make discoveries that impact everyone, and core to our approach is sharing our research and tools to fuel progress in the field.
Sort By
1 - 15 of 133 publications
Accurate human genome analysis with Element Avidity sequencing
Andrew Carroll
Daniel Cook
Lucas Brambrink
Bryan Lajoie
Kelly N. Wiseman
Sophie Billings
Semyon Kruglyak
Bryan R. Lajoie
Junhua Zhao
Shawn E. Levy
Kishwar Shafin
Maria Nattestad
BMC Bioinformatics (2025)
Preview abstract
We investigate the new sequencing technology Avidity from Element Biosciences. We show that Avidity whole genome sequencing matches mapping and variant calling accuracy with Illumina at high coverages (30x-50x) and is noticeably more accurate at lower coverages (20x-30x). We quantify base error rates of Element reads, finding lower error rates, especially in homopolymer and tandem repeat regions. We use Element’s ability to generate paired end sequencing with longer insert sizes than typical short–read sequencing. We show that longer insert sizes result in even higher accuracy, with long insert Element sequencing giving noticeably more accurate genome analyses at all coverages.
View details
Global prediction of extreme floods in ungauged watersheds
Vusumuzi Dube
Martin Gauch
Shaun Harrigan
Daniel Klotz
Frederik Kratzert
Asher Metzger
Sella Nevo
Florian Pappenberger
Christel Prudhomme
Guy Shalev
Shlomo Shenzis
Tadele Yednkachw Tekalign
Dana Weitzner
Nature (2024)
Preview abstract
Floods are one of the most common natural disasters, with a disproportionate impact in developing countries that often lack dense streamflow gauge networks. Accurate and timely warnings are critical for mitigating flood risks, but hydrological simulation models typically must be calibrated to long data records in each watershed. Here we show that AI-based forecasting achieves reliability in predicting extreme riverine events in ungauged watersheds at up to a 5-day lead time that is similar to or better than the reliability of nowcasts (0-day lead time) from a current state of the art global modeling system (the Copernicus Emergency Management Service Global Flood Awareness System). Additionally, we achieve accuracies over 5-year return period events that are similar to or better than current accuracies over 1-year return period events. This means that AI can provide flood warnings earlier and over larger and more impactful events in ungauged basins. The model developed in this paper was incorporated into an operational early warning system that produces publicly available (free and open) forecasts in real time in over 80 countries. This work highlights a need for increasing the availability of hydrological data to continue to improve global access to reliable flood warnings.
View details
Towards Generalist Biomedical AI
Danny Driess
Andrew Carroll
Chuck Lau
Ryutaro Tanno
Ira Ktena
Basil Mustafa
Aakanksha Chowdhery
Simon Kornblith
Philip Mansfield
Sushant Prakash
Renee Wong
Sunny Virmani
Sara Mahdavi
Bradley Green
Ewa Dominowska
Joelle Barral
Karan Singhal
Pete Florence
NEJM AI (2024)
Preview abstract
BACKGROUND: Medicine is inherently multimodal, requiring the simultaneous interpretation and integration of insights between many data modalities spanning text, imaging, genomics, and more. Generalist biomedical artificial intelligence systems that flexibly encode, integrate, and interpret these data might better enable impactful applications ranging from scientific discovery to care delivery.
METHODS: To catalyze development of these models, we curated MultiMedBench, a new multimodal biomedical benchmark. MultiMedBench encompasses 14 diverse tasks, such as medical question answering, mammography and dermatology image interpretation, radiology report generation and summarization, and genomic variant calling. We then introduced Med-PaLM Multimodal (Med-PaLM M), our proof of concept for a generalist biomedical AI system that flexibly encodes and interprets biomedical data including clinical language, imaging, and genomics with the same set of model weights. To further probe the capabilities and limitations of Med-PaLM M, we conducted a radiologist evaluation of model-generated (and human) chest x-ray reports.
RESULTS: We observed encouraging performance across model scales. Med-PaLM M reached performance competitive with or exceeding the state of the art on all MultiMedBench tasks, often surpassing specialist models by a wide margin. In a side-by-side ranking on 246 retrospective chest x-rays, clinicians expressed a pairwise preference for Med-PaLM Multimodal reports over those produced by radiologists in up to 40.50% of cases, suggesting potential clinical utility.
CONCLUSIONS: Although considerable work is needed to validate these models in real-world cases and understand if cross-modality generalization is possible, our results represent a milestone toward the development of generalist biomedical artificial intelligence systems.
View details
An intentional approach to managing bias in embedding models
Atilla P. Kiraly
Jungyeon Park
Rory Pilgrim
Charles Lau
Heather Cole-Lewis
Shravya Shetty
Krish Eswaran
Leo Anthony Celi
The Lancet Digital Health, 6 (2024), E126-E130
Preview abstract
Advances in machine learning for health care have brought concerns about bias from the research community; specifically, the introduction, perpetuation, or exacerbation of care disparities. Reinforcing these concerns is the finding that medical images often reveal signals about sensitive attributes in ways that are hard to pinpoint by both algorithms and people. This finding raises a question about how to best design general purpose pretrained embeddings (GPPEs, defined as embeddings meant to support a broad array of use cases) for building downstream models that are free from particular types of bias. The downstream model should be carefully evaluated for bias, and audited and improved as appropriate. However, in our view, well intentioned attempts to prevent the upstream components—GPPEs—from learning sensitive attributes can have unintended consequences on the downstream models. Despite producing a veneer of technical neutrality, the resultant end-to-end system might still be biased or poorly performing. We present reasons, by building on previously published data, to support the reasoning that GPPEs should ideally contain as much information as the original data contain, and highlight the perils of trying to remove sensitive attributes from a GPPE. We also emphasise that downstream prediction models trained for specific tasks and settings, whether developed using GPPEs or not, should be carefully designed and evaluated to avoid bias that makes models vulnerable to issues such as distributional shift. These evaluations should be done by a diverse team, including social scientists, on a diverse cohort representing the full breadth of the patient population for which the final model is intended.
View details
Demystifying Embedding Spaces using Large Language Models
Jihwan Jeong
Lior Shani
Aza Tulepbergenov
Martin Mladenov
The Twelfth International Conference on Learning Representations (2024)
Preview abstract
Embeddings have become a pivotal means to represent complex, multi-faceted information about entities, concepts, and relationships in a condensed and useful format. Nevertheless, they often preclude direct interpretation. While downstream tasks make use of these compressed representations, meaningful interpretation usually requires visualization using dimensionality reduction or specialized machine learning interpretability methods. This paper addresses the challenge of making such embeddings more interpretable and broadly useful, by employing large language models (LLMs) to directly interact with embeddings -- transforming abstract vectors into understandable narratives. By injecting embeddings into LLMs, we enable querying and exploration of complex embedding data. We demonstrate our approach on a variety of diverse tasks, including: enhancing concept activation vectors (CAVs), communicating novel embedded entities, and decoding user preferences in recommender systems. Our work couples the immense information potential of embeddings with the interpretative power of LLMs.
View details
Preview abstract
Artificial Intelligence (AI) holds the promise of transforming healthcare by improving patient outcomes, increasing accessibility and efficiency, and decreasing the cost of care. Realizing this vision of a healthier world for everyone everywhere requires partnerships and trust between healthcare systems, clinicians, payers, technology companies, pharmaceutical companies, and governments to drive innovations in machine learning and artificial intelligence to patients. Google is one example of a technology company that is partnering with healthcare systems, clinicians, and researchers to develop technology solutions that will directly improve the lives of patients. In this chapter we share landmark trials of the use of AI in healthcare. We also describe the application of our novel system of organizing information to unify data in electronic health records (EHRs) and bring an integrated view of patient records to clinicians. We discuss our consumer focused innovation in dermatology to help guide search journeys for personalized information about skin conditions. Finally, we share a perspective on how to embed ethics and a concern for all patients into the development of AI.
View details
Preview abstract
Floods are one of the most common natural disasters, with a disproportionate impact in developing countries that often lack dense streamflow gauge networks. Accurate and timely warnings are critical for mitigating flood risks, but hydrological simulation models typically must be calibrated to long data records in each watershed. Here we show that AI-based forecasting achieves reliability in predicting extreme riverine events in ungauged watersheds at up to a 5-day lead time that is similar to or better than the reliability of nowcasts (0-day lead time) from a current state of the art global modeling system (the Copernicus Emergency Management Service Global Flood Awareness System). Additionally, we achieve accuracies over 5-year return period events that are similar to or better than current accuracies over 1-year return period events. This means that AI can provide flood warnings earlier and over larger and more impactful events in ungauged basins. The model developed in this paper was incorporated into an operational early warning system that produces publicly available (free and open) forecasts in real time in over 80 countries. This work highlights a need for increasing the availability of hydrological data to continue to improve global access to reliable flood warnings.
View details
Preview abstract
Importance: Interest in artificial intelligence (AI) has reached an all-time high, and health care leaders across the ecosystem are faced with questions about where, when, and how to deploy AI and how to understand its risks, problems, and possibilities.
Observations: While AI as a concept has existed since the 1950s, all AI is not the same. Capabilities and risks of various kinds of AI differ markedly, and on examination 3 epochs of AI emerge. AI 1.0 includes symbolic AI, which attempts to encode human knowledge into computational rules, as well as probabilistic models. The era of AI 2.0 began with deep learning, in which models learn from examples labeled with ground truth. This era brought about many advances both in people’s daily lives and in health care. Deep learning models are task-specific, meaning they do one thing at a time, and they primarily focus on classification and prediction. AI 3.0 is the era of foundation models and generative AI. Models in AI 3.0 have fundamentally new (and potentially transformative) capabilities, as well as new kinds of risks, such as hallucinations. These models can do many different kinds of tasks without being retrained on a new dataset. For example, a simple text instruction will change the model’s behavior. Prompts such as “Write this note for a specialist consultant” and “Write this note for the patient’s mother” will produce markedly different content.
Conclusions and Relevance: Foundation models and generative AI represent a major revolution in AI’s capabilities, ffering tremendous potential to improve care. Health care leaders are making decisions about AI today. While any heuristic omits details and loses nuance, the framework of AI 1.0, 2.0, and 3.0 may be helpful to decision-makers because each epoch has fundamentally different capabilities and risks.
View details
Preview abstract
As instruction-tuned large language models (LLMs) gain global adoption, their ability to follow instructions in multiple languages becomes increasingly crucial. In this work, we investigate how multilinguality during instruction tuning of a multilingual LLM affects instruction-following across languages from the pre-training corpus. We first show that many languages transfer some instruction-following capabilities to other languages from even monolingual tuning. Furthermore, we find that only 40 multilingual examples integrated in an English tuning set substantially improve multilingual instruction-following, both in seen and unseen languages during tuning. In general, we observe that models tuned on multilingual mixtures exhibit comparable or superior performance in multiple languages compared to monolingually tuned models, despite training on 10x fewer examples in those languages. Finally, we find that diversifying the instruction tuning set with even just 2-4 languages significantly improves cross-lingual generalization. Our results suggest that building massively multilingual instruction-tuned models can be done with only a very small set of multilingual instruction-responses.
View details
Unsupervised representation learning on high-dimensional clinical data improves genomic discovery and prediction
Babak Behsaz
Zachary Ryan Mccaw
Davin Hill
Robert Luben
Dongbing Lai
John Bates
Howard Yang
Tae-Hwi Schwantes-An
Yuchen Zhou
Anthony Khawaja
Andrew Carroll
Brian Hobbs
Michael Cho
Nature Genetics (2024)
Preview abstract
Although high-dimensional clinical data (HDCD) are increasingly available in biobank-scale datasets, their use for genetic discovery remains challenging. Here we introduce an unsupervised deep learning model, Representation Learning for Genetic Discovery on Low-Dimensional Embeddings (REGLE), for discovering associations between genetic variants and HDCD. REGLE leverages variational autoencoders to compute nonlinear disentangled embeddings of HDCD, which become the inputs to genome-wide association studies (GWAS). REGLE can uncover features not captured by existing expert-defined features and enables the creation of accurate disease-specific polygenic risk scores (PRSs) in datasets with very few labeled data. We apply REGLE to perform GWAS on respiratory and circulatory HDCD—spirograms measuring lung function and photoplethysmograms measuring blood volume changes. REGLE replicates known loci while identifying others not previously detected. REGLE are predictive of overall survival, and PRSs constructed from REGLE loci improve disease prediction across multiple biobanks. Overall, REGLE contain clinically relevant information beyond that captured by existing expert-defined features, leading to improved genetic discovery and disease prediction.
View details
Local read haplotagging enables accurate long-read small variant calling
Daniel Cook
Maria Nattestad
Lucas Brambrink
Brandy McNulty
John E. Gorzynski
Sneha D. Goenka
Euan Ashley
Miten Jain
Karen Miga
Benedict Paten
Andrew Carroll
Kishwar Shafin
Nature Communications (2024)
Preview abstract
Long-read sequencing technology has enabled variant detection in difficult-to-map regions of the genome and enabled rapid genetic diagnosis in clinical settings. Rapidly evolving third-generation sequencing platforms like Pacific Biosciences (PacBio) and Oxford Nanopore Technologies (ONT) are introducing newer platforms and data types. It has been demonstrated that variant calling methods based on deep neural networks can use local haplotyping information with long-reads to improve the genotyping accuracy. However, using local haplotype information creates an overhead as variant calling needs to be performed multiple times which ultimately makes it difficult to extend to new data types and platforms as they get introduced. In this work, we have developed a local haplotype approximate method that enables state-of-the-art variant calling performance with multiple sequencing platforms including PacBio Revio system, ONT R10.4 simplex and duplex data. This addition of local haplotype approximation simplifies long-read variant calling with DeepVariant.
View details
Preview abstract
We study the space complexity of the two related fields of differential privacy and adaptive data analysis. Specifically,
(1) Under standard cryptographic assumptions, we show that there exists a problem P that requires exponentially more space to be solved efficiently with differential privacy, compared to the space needed without privacy. To the best of our knowledge, this is the first separation between the space complexity of private and non-private algorithms.
(2) The line of work on adaptive data analysis focuses on understanding the number of samples needed for answering a sequence of adaptive queries. We revisit previous lower bounds at a foundational level, and show that they are a consequence of a space bottleneck rather than a sampling bottleneck.
To obtain our results, we define and construct an encryption scheme with multiple keys that is built to withstand a limited amount of key leakage in a very particular way.
View details
Preview abstract
We introduce the concurrent shuffle model of differential privacy. In this model we have multiple concurrent shufflers permuting messages from different, possibly overlapping, batches of users. Similarly to the standard (single) shuffle model, the privacy requirement is that the concatenation of all shuffled messages should be differentially private. We study the private continual summation problem (a.k.a. the counter problem) and show that
the concurrent shuffle model allows for significantly improved error compared to a standard (single) shuffle model. Specifically, we give a summation algorithm with error $\Tilde{O}(n^{1/(2k+1)})$ with $k$ concurrent shufflers on a sequence of length $n$. Furthermore, we prove that this bound is tight for any $k$, even if the algorithm can choose the sizes of the batches adaptively. For $k=\log n$ shufflers, the resulting error is polylogarithmic, much better than $\Tilde{\Theta}(n^{1/3})$ which we show is the smallest possible with a single shuffler.
We use our online summation algorithm to get algorithms with improved regret bounds for the contextual linear bandit problem. In particular we get optimal $\Tilde{O}(\sqrt{n})$ regret with $k= \Tilde{\Omega}(\log n)$ concurrent shufflers.
View details
Preview abstract
We develop a deep learning based convolutional-regression model that estimates the volumetric soil moisture content in the top ~5 cm. Input predictors include Sentinel-1 (active radar), Sentinel-2 (optical imagery), and SMAP (passive radar) as well as geophysical variables from SoilGrids and modelled soil moisture fields from GLDAS. The model was trained and evaluated on data from ~1300 in-situ sensors globally over the period 2015 - 2021 and obtained an average per-sensor correlation of 0.72 and ubRMSE of 0.0546. These results are benchmarked against 13 other soil moisture estimates at different locations, and an ablation study was used to identify important predictors.
View details
Cost-utility analysis of deep learning and trained human graders for diabetic retinopathy screening in a nationwide program
Attasit Srisubat
Kankamon Kittrongsiri
Sermsiri Sangroongruangsri
Chalida Khemvaranan
Jacqueline Shreibati
Fred Hersch
Prut Hanutsaha
Varis Ruamviboonsuk
Saowalak Turongkaravee
Rajiv Raman
Dr. Paisan Raumviboonsuk
Ophthalmology (2023)
Preview abstract
Introduction
Deep learning (DL) for screening diabetic retinopathy (DR) has the potential to address limited healthcare resources by enabling expanded access to healthcare. However, there is still limited health economic evaluation, particularly in low- and middle-income countries, on this subject to aid decision-making for DL adoption.
Methods
In the context of a middle-income country (MIC), using Thailand as a model, we constructed a decision tree-Markov hybrid model to estimate lifetime costs and outcomes of Thailand’s national DR screening program via DL and trained human graders (HG). We calculated the incremental cost-effectiveness ratio (ICER) between the two strategies. Sensitivity analyses were performed to probe the influence of modeling parameters.
Results
From a societal perspective, screening with DL was associated with a reduction in costs of ~ US$ 2.70, similar quality-adjusted life-years (QALY) of + 0.0043, and an incremental net monetary benefit of ~ US$ 24.10 in the base case. In sensitivity analysis, DL remained cost-effective even with a price increase from US$ 1.00 to US$ 4.00 per patient at a Thai willingness-to-pay threshold of ~ US$ 4.997 per QALY gained. When further incorporating recent findings suggesting improved compliance to treatment referral with DL, our analysis models effectiveness benefits of ~ US$ 20 to US$ 50 depending on compliance.
Conclusion
DR screening using DL in an MIC using Thailand as a model may result in societal cost-savings and similar health outcomes compared with HG. This study may provide an economic rationale to expand DL-based DR screening in MICs as an alternative solution for limited availability of skilled human resources for primary screening, particularly in MICs with similar prevalence of diabetes and low compliance to referrals for treatment.
View details