Science AI

Inspired by the ability of AI to help tackle the grand challenges in science, our work involves pursuing breakthrough discoveries and developing new tools to accelerate the progress of science. Together with the scientific community, our aim is to enable scientific innovation for a better world.

Brain imagery

Inspired by the ability of AI to help tackle the grand challenges in science, our work involves pursuing breakthrough discoveries and developing new tools to accelerate the progress of science. Together with the scientific community, our aim is to enable scientific innovation for a better world.

Decoding biological complexity

We use AI to tackle the biggest unsolved questions in biology, from understanding the genome to solving the mysteries of the brain. These insights address core questions that benefit biomedical researchers, and ultimately millions of patients affected by rare genetic and neurological conditions.

Modeling the earth & environment

We use AI to improve humanity’s fundamental understanding of Earth’s complex systems: water, land, life and sky. By understanding our planet, we enable communities, researchers and governments to make more informed decisions that foster resilience and achieve a safer, healthier and more sustainable future for all.

Recent featured publications

An AI system to help scientists write expert-level empirical software
Johan Kartiwa
Matthew Abraham
Qian-Ze Zhu
Zahra Shamsi
Shibl Mourad
Julie Wang
Anastasiya Belyaeva
Scott Ellsworth
Yuchen Zhou
Jackson Cui
Grace Joseph
Malcolm Kane
Paul Raccuglia
Ryan Krueger
Jeffrey Cardille
Erica Brand
Renee Johnston
James Thompson
Chris Co
James Manyika
Anna Bulanova
David Smalling
Eser Aygün
Kat Chou
Gheorghe Comanici
arXiv (2025)
Preview abstract The cycle of scientific discovery is frequently bottlenecked by the slow, manual creation of software to support computational experiments. To address this, we present an AI system that creates expert-level scientific software whose goal is to maximize a quality metric. The system uses a Large Language Model (LLM) and Tree Search (TS) to systematically improve the quality metric and intelligently navigate the large space of possible solutions. The system achieves expert-level results when it explores and integrates complex research ideas from external sources. The effectiveness of tree search is demonstrated across a wide range of benchmarks. In bioinformatics, it discovered 40 novel methods for single-cell data analysis that outperformed the top human-developed methods on a public leaderboard. In epidemiology, it generated 14 models that outperformed the CDC ensemble and all other individual models for forecasting COVID-19 hospitalizations. Our method also produced state-of-the-art software for geospatial analysis, neural activity prediction in zebrafish, time series forecasting and numerical solution of integrals. By devising and implementing novel solutions to diverse tasks, the system represents a significant step towards accelerating scientific progress. Keywords: Tree Search, Generative AI, Scorable Scientific Tasks, Empirical Software View details
Accurate somatic small variant discovery for multiple sequencing technologies with DeepSomatic
Jimin Park
Daniel E. Cook
Lucas Brambrink
Joshua Gardner
Brandy McNulty
Samuel Sacco
Ayse G. Keskus
Asher Bryant
Tanveer Ahmad
Jyoti Shetty
Yongmei Zhao
Bao Tran
Giuseppe Narzisi
Adrienne Helland
Byunggil Yoo
Irina Pushel
Lisa A. Lansdon
Chengpeng Bi
Adam Walter
Margaret Gibson
Tomi Pastinen
Rebecca Reiman
Sharvari Mankame
T. Rhyker Ranallo-Benavidez
Christine Brown
Nicolas Robine
Floris P. Barthel
Midhat S. Farooqi
Karen H. Miga
Andrew Carroll
Mikhail Kolmogorov
Benedict Paten
Kishwar Shafin
Nature Biotechnology (2025)
Preview abstract Somatic variant detection is an integral part of cancer genomics analysis. While most methods have focused on short-read sequencing, long-read technologies offer potential advantages in repeat mapping and variant phasing. We present DeepSomatic, a deep-learning method for detecting somatic small nucleotide variations and insertions and deletions from both short-read and long-read data. The method has modes for whole-genome and whole-exome sequencing and can run on tumor–normal, tumor-only and formalin-fixed paraffin-embedded samples. To train DeepSomatic and help address the dearth of publicly available training and benchmarking data for somatic variant detection, we generated and make openly available the Cancer Standards Long-read Evaluation (CASTLE) dataset of six matched tumor–normal cell line pairs whole-genome sequenced with Illumina, PacBio HiFi and Oxford Nanopore Technologies, along with benchmark variant sets. Across samples, both cell line and patient-derived, and across short-read and long-read sequencing technologies, DeepSomatic consistently outperforms existing callers. View details
Global earthquake detection and warning using Android phones
Marc Stogaitis
Youngmin Cho
Richard Allen
Boone Spooner
Patrick Robertson
Micah Berman
Greg Wimpey
Robert Bosch
Nivetha Thiruverahan
Steve Malkos
Alexei Barski
Science, 389 (2025), pp. 254-259
Preview abstract Earthquake early-warning systems are increasingly being deployed as a strategy to reduce losses in earthquakes, but the regional seismic networks they require do not exist in many earthquake-prone countries. We use the global Android smartphone network to develop an earthquake detection capability, an alert delivery system, and a user feedback framework. Over 3 years of operation, the system detected an average of 312 earthquakes per month with magnitudes from M 1.9 to M 7.8 in Türkiye. Alerts were delivered in 98 countries for earthquakes with M ≥4.5, corresponding to ~60 events and 18 million alerts per month. User feedback shows that 85% of people receiving an alert felt shaking, and 36, 28, and 23% received the alert before, during, and after shaking, respectively. We show how smartphone-based earthquake detection algorithms can be implemented at scale and improved through postevent analysis. View details
Light-microscopy-based dense connectomic reconstruction of mammalian brain tissue
Mojtaba R. Tavakoli
Julia Lyudchik
Vitali Vistunou
Nathalie Agudelo Duenas
Jakob Vorlaufer
Christoph Sommer
Caroline Kreuzinger
Barbara de Souza Oliveira
Alban Cenameri
Gaia Novarino
Johann Danzl
Nature (2025)
Preview abstract The information-processing capability of the brain’s cellular network depends on the physical wiring pattern between neurons and their molecular and functional characteristics. Charting neurons and resolving the individual synaptic connections requires volumetric imaging at nanoscale resolution and comprehensive cellular contrast. Light microscopy is uniquely positioned to visualize specific molecules but dense, synapse-level circuit reconstruction by light microscopy has been out of reach due to limitations in resolution, contrast, and volumetric imaging capability. Here we developed light-microscopy based connectomics (LICONN). We integrated hydrogel embedding and expansion with comprehensive deep-learning based segmentation and analysis of connectivity, thus directly incorporating molecular information in synapse-level brain tissue reconstructions. LICONN will allow synapse-level brain tissue phenotyping in biological experiments in a readily adoptable manner. View details
Neural general circulation models for weather and climate
Dmitrii Kochkov
Janni Yuval
Ian Langmore
Jamie Smith
Griffin Mooers
Milan Kloewer
James Lottes
Peter Dueben
Samuel Hatfield
Peter Battaglia
Alvaro Sanchez
Matthew Willson
Stephan Hoyer
Nature, 632 (2024), pp. 1060-1066
Preview abstract General circulation models (GCMs) are the foundation of weather and climate prediction. GCMs are physics-based simulators that combine a numerical solver for large-scale dynamics with tuned representations for small-scale processes such as cloud formation. Recently, machine-learning models trained on reanalysis data have achieved comparable or better skill than GCMs for deterministic weather forecasting. However, these models have not demonstrated improved ensemble forecasts, or shown sufficient stability for long-term weather and climate simulations. Here we present a GCM that combines a differentiable solver for atmospheric dynamics with machine-learning components and show that it can generate forecasts of deterministic weather, ensemble weather and climate on par with the best machine-learning and physics-based methods. NeuralGCM is competitive with machine-learning models for one- to ten-day forecasts, and with the European Centre for Medium-Range Weather Forecasts ensemble prediction for one- to fifteen-day forecasts. With prescribed sea surface temperature, NeuralGCM can accurately track climate metrics for multiple decades, and climate forecasts with 140-kilometre resolution show emergent phenomena such as realistic frequency and trajectories of tropical cyclones. For both weather and climate, our approach offers orders of magnitude computational savings over conventional GCMs, although our model does not extrapolate to substantially different future climates. Our results show that end-to-end deep learning is compatible with tasks performed by conventional GCMs and can enhance the large-scale physical simulations that are essential for understanding and predicting the Earth system. View details
A petavoxel fragment of human cerebral cortex reconstructed at nanoscale resolution
Alex Shapson-Coe
Daniel R. Berger
Yuelong Wu
Richard L. Schalek
Shuohong Wang
Neha Karlupia
Sven Dorkenwald
Evelina Sjostedt
Dongil Lee
Luke Bailey
Angerica Fitzmaurice
Rohin Kar
Benjamin Field
Hank Wu
Julian Wagner-Carena
David Aley
Joanna Lau
Zudi Lin
Donglai Wei
Hanspeter Pfister
Adi Peleg
Jeff W. Lichtman
Science (2024)
Preview abstract To fully understand how the human brain works, knowledge of its structure at high resolution is needed. Presented here is a computationally intensive reconstruction of the ultrastructure of a cubic millimeter of human temporal cortex that was surgically removed to gain access to an underlying epileptic focus. It contains about 57,000 cells, about 230 millimeters of blood vessels, and about 150 million synapses and comprises 1.4 petabytes. Our analysis showed that glia outnumber neurons 2:1, oligodendrocytes were the most common cell, deep layer excitatory neurons could be classified on the basis of dendritic orientation, and among thousands of weak connections to each neuron, there exist rare powerful axonal inputs of up to 50 synapses. Further studies using this resource may bring valuable insights into the mysteries of the human brain. View details
Global prediction of extreme floods in ungauged watersheds
Vusumuzi Dube
Martin Gauch
Shaun Harrigan
Daniel Klotz
Frederik Kratzert
Asher Metzger
Sella Nevo
Florian Pappenberger
Christel Prudhomme
Guy Shalev
Shlomo Shenzis
Tadele Yednkachw Tekalign
Dana Weitzner
Nature (2024)
Preview abstract Floods are one of the most common natural disasters, with a disproportionate impact in developing countries that often lack dense streamflow gauge networks. Accurate and timely warnings are critical for mitigating flood risks, but hydrological simulation models typically must be calibrated to long data records in each watershed. Here we show that AI-based forecasting achieves reliability in predicting extreme riverine events in ungauged watersheds at up to a 5-day lead time that is similar to or better than the reliability of nowcasts (0-day lead time) from a current state of the art global modeling system (the Copernicus Emergency Management Service Global Flood Awareness System). Additionally, we achieve accuracies over 5-year return period events that are similar to or better than current accuracies over 1-year return period events. This means that AI can provide flood warnings earlier and over larger and more impactful events in ungauged basins. The model developed in this paper was incorporated into an operational early warning system that produces publicly available (free and open) forecasts in real time in over 80 countries. This work highlights a need for increasing the availability of hydrological data to continue to improve global access to reliable flood warnings. View details
Mapping the ionosphere with millions of phones
Jamie Smith
Anton Geraschenko
Jade Morton
Frank van Diggelen
Nature (2024)
Preview abstract The ionosphere is a layer of weakly ionized plasma bathed in Earth’s geomagnetic field extending about 50–1,500 kilometres above Earth1. The ionospheric total electron content varies in response to Earth’s space environment, interfering with Global Satellite Navigation System (GNSS) signals, resulting in one of the largest sources of error for position, navigation and timing services2. Networks of high-quality ground-based GNSS stations provide maps of ionospheric total electron content to correct these errors, but large spatiotemporal gaps in data from these stations mean that these maps may contain errors3. Here we demonstrate that a distributed network of noisy sensors—in the form of millions of Android phones—can fill in many of these gaps and double the measurement coverage, providing an accurate picture of the ionosphere in areas of the world underserved by conventional infrastructure. Using smartphone measurements, we resolve features such as plasma bubbles over India and South America, solar-storm-enhanced density over North America and a mid-latitude ionospheric trough over Europe. We also show that the resulting ionosphere maps can improve location accuracy, which is our primary aim. This work demonstrates the potential of using a large distributed network of smartphones as a powerful scientific instrument for monitoring Earth. View details

Explore and engage with our research via curated NotebookLMs

Our Broader Mission

Follow us

×