Publications
Our teams aspire to make discoveries that impact everyone, and core to our approach is sharing our research and tools to fuel progress in the field.
Our teams aspire to make discoveries that impact everyone, and core to our approach is sharing our research and tools to fuel progress in the field.
Sort By
1 - 15 of 360 publications
Neural general circulation models for modeling precipitation
Stephan Hoyer
Dmitrii Kochkov
Janni Yuval
Ian Langmore
Science Advances (2026)
Preview abstract
Climate models struggle to accurately simulate precipitation, particularly extremes and the diurnal cycle. While hybrid models combining machine learning and physics have emerged with the premise of improving precipitation simulations, none have proven sufficiently skillful or stable enough to outperform existing models in simulating precipitation.
Here, we present the first hybrid model that is trained directly on precipitation observations. The model runs at 2.8 degrees resolution and is built on the differentiable NeuralGCM framework. This model is stable for decadal simulations and demonstrates significant improvements over existing GCMs, ERA5 reanalysis, and a Global Cloud-Resolving Model in simulating precipitation. Our approach yields reduced biases, a more realistic precipitation distribution, improved representation of extremes, and a more accurate diurnal cycle.
Furthermore, it outperforms the ECMWF ensemble for mid-range weather forecasting.
This advance paves the way for more reliable simulations of current climate and for the ability to fully utilize the abundance of existing observations to further improve GCMs.
View details
Expert evaluation of LLM world models: A high-Tc superconductivity case study
Haoyu Guo
Maria Tikhanovskaya
Paul Raccuglia
Alexey Vlaskin
Chris Co
Scott Ellsworth
Matthew Abraham
Lizzie Dorfman
Peter Armitage
Chunhan Feng
Antoine Georges
Olivier Gingras
Dominik Kiese
Steve Kivelson
Vadim Oganesyan
Brad Ramshaw
Subir Sachdev
Senthil Todadri
John Tranquada
Eun-Ah Kim
Proceedings of the National Academy of Sciences (2026)
Preview abstract
Large Language Models (LLMs) show great promise as a powerful tool for scientific literature exploration. However, their effectiveness in providing scientifically accurate and comprehensive answers to complex questions within specialized domains remains an active area of research. This work evaluates the performance of six different LLM-based systems for answering scientific literature questions, including commercially available closed models and a custom retrieval-augmented generation (RAG) system capable of retrieving images alongside text. We conduct a rigorous expert evaluation of the systems in the domain of high-temperature cuprate superconductors, a research area that involves material science, experimental physics, computation, and theoretical physics. We use an expert-curated database of 1726 scientific papers and a set of 67 expert-formulated questions. The evaluation employs a multi-faceted rubric assessing balanced perspectives, factual comprehensiveness, succinctness, evidentiary support, and image relevance. Our results demonstrate that RAG-based systems, powered by curated data and multimodal retrieval, outperform existing closed models across key metrics, particularly in providing comprehensive and well-supported answers, and in retrieving relevant visual information. This study provides valuable insights into designing and evaluating specialized scientific literature understanding systems, particularly with expert involvement, while also highlighting the importance of rich, domain-specific data in such systems.
View details
Accurate human genome analysis with Element Avidity sequencing
Andrew Carroll
Daniel Cook
Lucas Brambrink
Bryan Lajoie
Kelly N. Wiseman
Sophie Billings
Semyon Kruglyak
Bryan R. Lajoie
Junhua Zhao
Shawn E. Levy
Kishwar Shafin
Maria Nattestad
BMC Bioinformatics (2025)
Preview abstract
We investigate the new sequencing technology Avidity from Element Biosciences. We show that Avidity whole genome sequencing matches mapping and variant calling accuracy with Illumina at high coverages (30x-50x) and is noticeably more accurate at lower coverages (20x-30x). We quantify base error rates of Element reads, finding lower error rates, especially in homopolymer and tandem repeat regions. We use Element’s ability to generate paired end sequencing with longer insert sizes than typical short–read sequencing. We show that longer insert sizes result in even higher accuracy, with long insert Element sequencing giving noticeably more accurate genome analyses at all coverages.
View details
Gemini & Physical World: Large Language Models Can Estimate the Intensity of Earthquake Shaking from Multi-Modal Social Media Posts
Marc Stogaitis
Tajinder Gadh
Richard Allen
Alexei Barski
Robert Bosch
Patrick Robertson
Youngmin Cho
Nivetha Thiruverahan
Aman Raj
Geophysical Journal International (2025), ggae436
Preview abstract
This paper presents a novel approach for estimating the ground shaking intensity using real-time social media data and CCTV footage. Employing the Gemini 1.5 Pro’s (Reid et al. 2024) model, a multi-modal language model, we demonstrate the ability to extract relevant information from unstructured data utilizing generative AI and natural language processing. The model’s output, in the form of Modified Mercalli Intensity (MMI) values, align well with independent observational data. Furthermore, our results suggest that beyond its advanced visual and auditory understanding abilities, Gemini appears to utilize additional sources of knowledge, including a simplified understanding of the general relationship between earthquake magnitude, distance, and MMI intensity, which it presumably acquired during its training, in its reasoning and decision-making processes. These findings raise intriguing questions about the extent of Gemini's general understanding of the physical world and its phenomena. Gemini’s ability to generate results consistent with established scientific knowledge highlights the potential of LLMs like Gemini in augmenting our understanding of complex physical phenomena such as earthquakes. More specifically, the results of this study highlight the potential of LLMs like Gemini to revolutionize citizen seismology by enabling rapid, effective, and flexible analysis of crowdsourced data from eyewitness accounts for assessing earthquake impact and providing crisis situational awareness. This approach holds a great promise for improving early warning systems, disaster response, and overall resilience in earthquake-prone regions. This study provides a significant step toward harnessing the power of social media and AI for earthquake disaster mitigation.
View details
Unprecedented Insights into Maternal Sleep: A Large-scale Longitudinal Analysis of Real-world Wearable Device Data Before, During, and After Pregnancy
Nichole Young-Lin
Conor Heneghan
Logan Schneider
Logan Niehaus
Ariel Haney
Karla Gleichauf
Jacqueline Shreibati
Belen Lafon
Lancet eBioMedicine (2025)
Preview abstract
Introduction: Current understanding of pregnancy and postpartum sleep is driven by limited lab or self-reported data. Consumer wearable devices may help reveal longitudinal, real-world sleep patterns.
Methods: We analyzed de-identified wearable device data from 2,540 users in the United States and Canada who met strict wear-time requirements (≥80% daily usage for ≥80% of the time periods of interest [12 weeks prepregnancy, throughout pregnancy, and 20 weeks immediately postpartum]). We tracked sleep time and staging using Fitbit devices.
Results: Compared to prepregnancy, total sleep time (TST) increased from an average of 425.3±43.5 min to a peak of 447.6±47.6 min at gestational week 10 with ongoing declines throughout pregnancy. Time in bed (TIB) followed a similar pattern. Increased light sleep drove the initial TST rise. Deep and REM sleep decreased significantly throughout pregnancy, with maximum reductions of 19.2±13.8 min (p<0.01) and 9.0±19.2 min (p<0.01) respectively by pregnancy end. Sleep efficiency also declined slightly during pregnancy (median drop from 88.3% to 86.8%). After delivery, TIB remained below the prepregnancy baseline by 14.7±45.7 min at one year postpartum and 15.2±47.7 min at 1.5 years postpartum.
Conclusion: This unprecedented look at large-scale, real-world sleep and pregnancy patterns revealed a previously unquantified initial increase in sleep followed by decreases in both quantity and quality as pregnancy progresses. Sleep deficits persist for at least 1.5 years postpartum. These quantified trends can assist clinicians and patients in understanding what to expect.
View details
IM-DD vs. Coherent in Datacenters: A Revisit in 2025
Optical Fiber Communication (OFC) Conference 2025 (2025)
Preview abstract
This tutorial examines the progress and scaling limitations of IM-DD based optical technologies and explores how datacenter use cases optimized coherent technology, including a newly proposed polarization-folding, time-diversity approach and a novel single-sideband coherent detection technology—can address some of these challenges
View details
Preview abstract
As part of Google's ongoing efforts to define best practices for secure AI systems, we’re sharing our aspirational framework for secure AI agents. We advocate for a hybrid, defense-in-depth strategy that combines the strengths of traditional, deterministic security controls with dynamic, reasoning-based defenses. This approach is grounded in three core principles: agents must have well-defined human controllers, their powers must be carefully limited, and their actions and planning must be observable. This paper reflects our current thinking and the direction of our efforts as we work towards ensuring that AI agents can be powerful, useful, and secure by default.
View details
ReviewCoin: Paying for Real Work
arXiv, Google Research (2025)
Preview abstract
The peer-review process is broken and the problem is getting worse, especially in AI: large conferences like NeurIPS increasingly struggle to adequately review huge numbers of paper submissions. I propose a scalable solution that, foremost, recognizes reviewing as important, necessary, \emph{work} and rewards it with crypto-coins owned and managed by the conferences themselves. The idea is at its core quite simple: paper submissions require work (reviews, meta-reviews, etc.) to be done, and therefore the submitter must pay for that work. Each reviewer submits their review to be approved by some designated conference officer (e.g. PC chair, Area Chair, etc.), and upon approval is paid a single coin for a single review. If three reviews are required, the cost of submission should be three coins + a tax that covers payments to all the volunteers who organize the conference. After some one-time startup costs to fairly distribute coins, the process should be relatively stable with new coins minted only when a conference grows.
View details
Towards AI-assisted academic writing
Malcolm Kane
Madeleine Grunde-McLaughlin
Ian Lang
Proceedings of the 1st Workshop on AI and Scientific Discovery: Directions and Opportunities, Association for Computational Linguistics (2025), pp. 31-45
Preview abstract
We present components of an AI-assisted academic writing system including citation recommendation and introduction writing. The system recommends citations by considering the user’s current document context to provide relevant suggestions. It generates introductions in a structured fashion, situating the contributions of the research relative to prior work. We demonstrate the effectiveness of the components through quantitative evaluations. Finally, the paper presents qualitative research exploring how researchers incorporate citations into their writing workflows. Our findings indicate that there is demand for precise AI-assisted writing systems and simple, effective methods for meeting those needs.
View details
Preview abstract
The need for characterizing global variability of atmospheric carbon dioxide (CO2) is quickly increasing, with a growing urgency for tracking greenhouse gasses with sufficient resolution, precision and accuracy so as to support independent verification of CO2 fluxes at local to global scales. The current generation of space-based sensors, however, can only provide sparse observations in space and/or in time, by design. While upcoming missions may address some of these challenges, most are still years away from launch. This challenge has fueled interest in the potential use of data from existing missions originally developed for other applications for inferring global greenhouse gas variability.
The Advanced Baseline Imager (ABI) onboard the Geostationary Operational Environmental Satellite (GOES-East), operational since 2017, provides full coverage of much of the western hemisphere at 10-minute intervals from geostationary orbit at 16 wavelengths. We leverage this high temporal resolution by developing a single-pixel, fully-connected neural network to estimate dry-air column CO2 mole fractions (XCO2). The model employs a time series of GOES-East's 16 spectral bands, which aids in disentangling atmospheric CO2 from surface reflectance, alongside ECMWF ERA5 lower tropospheric meteorology, solar angles, and day of year. Training used collocated GOES-East and OCO-2/OCO-3 observations (2017-2020, within 5 km and 10 minutes), with validation and testing performed on 2021 data.
The model successfully captures monthly latitudinal XCO2 gradients and shows reasonable agreement with ground-based TCCON measurements. Furthermore, we demonstrate the model's ability to detect elevated XCO2 signals from high-emitting power plants, particularly over low-reflectance surfaces. We also confirm that removing bands 5 (1.6 µm) and 16 (13.3 µm) substantially decreases performance, indicating that the model is able to extract useful information from these bands.
Although GOES-East derived XCO2 precision may not rival dedicated instruments, its unprecedented combination of contiguous geographic coverage, 10-minute temporal frequency, and multi-year record offers the potential to observe aspects of atmospheric CO2 variability currently unseen from space, with further potential through spatio-temporal aggregation.
View details
Simulation-Based Inference: A Practical Guide
Michael Deistler
Jan Boelts
Peter Steinbach
Guy Moss
Thomas Moreau
Manuel Gloeckler
Pedro L. C. Rodriguez
Julia Linhart
Janne K. Lappalainen
Benjamin Kurt Miller
Pedro J. Goncalves
Cornelius Schröder
Jakob H. Macke
arXiv (2025)
Preview abstract
A central challenge in many areas of science and engineering is to identify model parameters that are consistent with empirical data and prior knowledge. Bayesian inference offers a principled framework for this task, but can be computationally prohibitive when models are defined by stochastic simulators. Simulation-Based Inference (SBI) provides a suite of methods to overcome this limitation and has enabled scientific discoveries in fields such as particle physics, astrophysics and neuroscience. The core idea of SBI is to train neural networks on data generated by a simulator, without requiring access to likelihood evaluations. Once trained, the neural network can rapidly perform inference on empirical observations without requiring additional optimization or simulations. In this tutorial, we provide a practical guide for practitioners aiming to apply SBI methods. We outline a structured SBI workflow and offer practical guidelines and diagnostic tools for every stage of the process--from setting up the simulator and prior, choosing the SBI method and neural network architecture, training the inference model, to validating results and interpreting the inferred parameters. We illustrate these steps through examples from astrophysics, psychophysics, and neuroscience. This tutorial empowers researchers to apply state-of-the-art SBI methods, facilitating efficient parameter inference for scientific discovery.
View details
Consideration on CMAS arriving as discrete particles
Eric H. Jordan
Stephen Jordan
Hiram Diaz
Byung-gun Jun
(2025)
Preview abstract
Turbine contaminants known as CMAS mostly arrive as individual particles in a range of mineral compositions to turbine hot sections where they are deposited and within a small area can be treated as arriving at random locations as splats. By the time the particles reach the hot section the particle size maximum is believed to be 10 microns. Using a simplified heat transfer analysis suggests the arriving temperature will be the turbine inlet temperature. Using AFRL03 as a representative set of possible minerals, for most turbine inlet temperatures a mixture of melted and un-melted particles will arrive. There are 31 combinations of the 5 minerals of AFRL03 presenting a wide range of melting points experimentally investigated in this paper. As expected, combinations generally melt at lower temperatures than the highest melting mineral in each combination. The progression of conditions starting with the arrival of isolated individual minerals is modeled using monte carlo simulations and known materials from percolation theory. This allows understanding of the development of coverage fraction and potential for mineral mixing important to melt behavior as a function of normalized CMAS dose. Using the normalized CMAS dose it is also possible to comment on the likely relative fraction of coating life during which less than fully homogenized CMAS dominates behavior. It is noteworthy that 4 out of 5 minerals and 4 mineral combinations lack either calcium or silicon or both and also melt below 1300°C. Interaction in the early deposition stage involves non CMAS like chemistries.
View details
Instability of steady-state mixed-state symmetry-protected topological order to strong-to-weak spontaneous symmetry breaking
Jeet Shah
Christopher Fechisin
Yuxin Wang
Joseph T. Iosue
James D. Watson
Yan-Qi Wang
Brayden Ware
Cheng-Ju Lin
Alexey Gorshkov
Quantum (2025)
Preview abstract
Recent experimental progress in controlling open quantum systems enables the pursuit of mixed-
state nonequilibrium quantum phases. We investigate whether open quantum systems hosting
mixed-state symmetry-protected topological states as steady states retain this property under sym-
metric perturbations. Focusing on the decohered cluster state—a mixed-state symmetry-protected
topological state protected by a combined strong and weak symmetry—we construct a parent Lind-
bladian that hosts it as a steady state. This Lindbladian can be mapped onto exactly solvable
reaction-diffusion dynamics, even in the presence of certain perturbations, allowing us to solve the
parent Lindbladian in detail and reveal previously-unknown steady states. Using both analytical
and numerical methods, we find that typical symmetric perturbations cause strong-to-weak sponta-
neous symmetry breaking at arbitrarily small perturbations, destabilize the steady-state mixed-state
symmetry-protected topological order. However, when perturbations introduce only weak symmetry
defects, the steady-state mixed-state symmetry-protected topological order remains stable. Addi-
tionally, we construct a quantum channel which replicates the essential physics of the Lindbladian
and can be efficiently simulated using only Clifford gates, Pauli measurements, and feedback.
View details
CURIE: Evaluating LLMs on multitask long context scientific understanding and reasoning
Matthew Abraham
Haining Pan
Zahra Shamsi
Muqthar Mohammad
Chenfei Jiang
Ruth Alcantara
Gowoon Cheon
Xuejian Ma
Michael Statt
Jackson Cui
Nayantara Mudur
Eun-Ah Kim
Paul Raccuglia
Victor V. Albert
Lizzie Dorfman
Brian Rohr
Shutong Li
Maria Tikhanovskaya
Drew Purves
Elise Kleeman
Philippe Faist
Ean Phing VanLee
International Conference on Learning Representations (ICLR) (2025)
Preview abstract
The core of the scientific problem-solving process involves synthesizing information while applying expert knowledge. Large Language Models (LLMs) have the potential to accelerate this process due to their extensive knowledge across a variety of domains. Recent advancements have also made it possible for LLMs to handle very long "in-context" content. However, existing evaluations of long-context LLMs have focused on assessing their ability to summarize or retrieve information within the given context, primarily in generalist tasks that do not require deep scientific expertise. To facilitate analogous assessments of domain-specific tasks, we introduce the scientific long-Context Understanding and Reasoning Inference Evaluations (CURIE) benchmark. This benchmark provides a set of 8 challenging tasks, derived from around 250 scientific research papers, requiring domain expertise, comprehension of long in-context information, and multi-step reasoning that tests the ability of LLMs to assist scientists in realistic workflows. Tasks in CURIE have been collected from experts in six disciplines - materials science, theoretical condensed matter physics, quantum computing, geospatial analysis, biodiversity, and protein sequencing - covering both experimental and theoretical workflows in science. We evaluate a range of closed and open LLMs on these tasks. Additionally, we propose strategies for task decomposition, which allow for a more nuanced evaluation of the models and facilitate staged multi-step assessments. We hope that insights gained from CURIE can guide the future development of LLMs.
View details
Confinement in a Z2 lattice gauge theory on a quantum computer
Julius Mildenberger
Wojtek Mruczkiewicz
Jad C. Halimeh
Philipp Hauke
Nature Physics (2025)
Preview abstract
Gauge theories describe the fundamental forces in the standard model of particle physics and play an important role in condensed-matter physics. The constituents of gauge theories, for example, charged matter and electric gauge field, are governed by local gauge constraints, which lead to key phenomena such as the confinement of particles that are not fully understood. In this context, quantum simulators may address questions that are challenging for classical methods. Although engineering gauge constraints is highly demanding, recent advances in quantum computing are beginning to enable digital quantum simulations of gauge theories. Here we simulate confinement dynamics in a Z2 lattice gauge theory on a superconducting quantum processor. Tuning a term that couples only to the electric field produces confinement of charges, a manifestation of the tight bond that the gauge constraint generates between both. Moreover, we show how a modification of the gauge constraint from Z2 towards U(1) symmetry freezes the system dynamics. Our work illustrates the restriction that the underlying gauge constraint imposes on the dynamics of a lattice gauge theory, showcases how gauge constraints can be modified and protected, and promotes the study of other models governed by multibody interactions.
View details