Publications

Our teams aspire to make discoveries that impact everyone, and core to our approach is sharing our research and tools to fuel progress in the field.

people standing in front of a screen with images and a chipboard

Our teams aspire to make discoveries that impact everyone, and core to our approach is sharing our research and tools to fuel progress in the field.

Sort By
  • Title
  • Title, descending
  • Year
  • Year, descending
1 - 15 of 360 publications
    Neural general circulation models for modeling precipitation
    Stephan Hoyer
    Dmitrii Kochkov
    Janni Yuval
    Ian Langmore
    Science Advances (2026)
    Preview abstract Climate models struggle to accurately simulate precipitation, particularly extremes and the diurnal cycle. While hybrid models combining machine learning and physics have emerged with the premise of improving precipitation simulations, none have proven sufficiently skillful or stable enough to outperform existing models in simulating precipitation. Here, we present the first hybrid model that is trained directly on precipitation observations. The model runs at 2.8 degrees resolution and is built on the differentiable NeuralGCM framework. This model is stable for decadal simulations and demonstrates significant improvements over existing GCMs, ERA5 reanalysis, and a Global Cloud-Resolving Model in simulating precipitation. Our approach yields reduced biases, a more realistic precipitation distribution, improved representation of extremes, and a more accurate diurnal cycle. Furthermore, it outperforms the ECMWF ensemble for mid-range weather forecasting. This advance paves the way for more reliable simulations of current climate and for the ability to fully utilize the abundance of existing observations to further improve GCMs. View details
    Expert evaluation of LLM world models: A high-Tc superconductivity case study
    Haoyu Guo
    Maria Tikhanovskaya
    Paul Raccuglia
    Alexey Vlaskin
    Chris Co
    Scott Ellsworth
    Matthew Abraham
    Lizzie Dorfman
    Peter Armitage
    Chunhan Feng
    Antoine Georges
    Olivier Gingras
    Dominik Kiese
    Steve Kivelson
    Vadim Oganesyan
    Brad Ramshaw
    Subir Sachdev
    Senthil Todadri
    John Tranquada
    Eun-Ah Kim
    Proceedings of the National Academy of Sciences (2026)
    Preview abstract Large Language Models (LLMs) show great promise as a powerful tool for scientific literature exploration. However, their effectiveness in providing scientifically accurate and comprehensive answers to complex questions within specialized domains remains an active area of research. This work evaluates the performance of six different LLM-based systems for answering scientific literature questions, including commercially available closed models and a custom retrieval-augmented generation (RAG) system capable of retrieving images alongside text. We conduct a rigorous expert evaluation of the systems in the domain of high-temperature cuprate superconductors, a research area that involves material science, experimental physics, computation, and theoretical physics. We use an expert-curated database of 1726 scientific papers and a set of 67 expert-formulated questions. The evaluation employs a multi-faceted rubric assessing balanced perspectives, factual comprehensiveness, succinctness, evidentiary support, and image relevance. Our results demonstrate that RAG-based systems, powered by curated data and multimodal retrieval, outperform existing closed models across key metrics, particularly in providing comprehensive and well-supported answers, and in retrieving relevant visual information. This study provides valuable insights into designing and evaluating specialized scientific literature understanding systems, particularly with expert involvement, while also highlighting the importance of rich, domain-specific data in such systems. View details
    Accurate human genome analysis with Element Avidity sequencing
    Andrew Carroll
    Daniel Cook
    Lucas Brambrink
    Bryan Lajoie
    Kelly N. Wiseman
    Sophie Billings
    Semyon Kruglyak
    Bryan R. Lajoie
    Junhua Zhao
    Shawn E. Levy
    Kishwar Shafin
    Maria Nattestad
    BMC Bioinformatics (2025)
    Preview abstract We investigate the new sequencing technology Avidity from Element Biosciences. We show that Avidity whole genome sequencing matches mapping and variant calling accuracy with Illumina at high coverages (30x-50x) and is noticeably more accurate at lower coverages (20x-30x). We quantify base error rates of Element reads, finding lower error rates, especially in homopolymer and tandem repeat regions. We use Element’s ability to generate paired end sequencing with longer insert sizes than typical short–read sequencing. We show that longer insert sizes result in even higher accuracy, with long insert Element sequencing giving noticeably more accurate genome analyses at all coverages. View details
    Gemini & Physical World: Large Language Models Can Estimate the Intensity of Earthquake Shaking from Multi-Modal Social Media Posts
    Marc Stogaitis
    Tajinder Gadh
    Richard Allen
    Alexei Barski
    Robert Bosch
    Patrick Robertson
    Youngmin Cho
    Nivetha Thiruverahan
    Aman Raj
    Geophysical Journal International (2025), ggae436
    Preview abstract This paper presents a novel approach for estimating the ground shaking intensity using real-time social media data and CCTV footage. Employing the Gemini 1.5 Pro’s (Reid et al. 2024) model, a multi-modal language model, we demonstrate the ability to extract relevant information from unstructured data utilizing generative AI and natural language processing. The model’s output, in the form of Modified Mercalli Intensity (MMI) values, align well with independent observational data. Furthermore, our results suggest that beyond its advanced visual and auditory understanding abilities, Gemini appears to utilize additional sources of knowledge, including a simplified understanding of the general relationship between earthquake magnitude, distance, and MMI intensity, which it presumably acquired during its training, in its reasoning and decision-making processes. These findings raise intriguing questions about the extent of Gemini's general understanding of the physical world and its phenomena. Gemini’s ability to generate results consistent with established scientific knowledge highlights the potential of LLMs like Gemini in augmenting our understanding of complex physical phenomena such as earthquakes. More specifically, the results of this study highlight the potential of LLMs like Gemini to revolutionize citizen seismology by enabling rapid, effective, and flexible analysis of crowdsourced data from eyewitness accounts for assessing earthquake impact and providing crisis situational awareness. This approach holds a great promise for improving early warning systems, disaster response, and overall resilience in earthquake-prone regions. This study provides a significant step toward harnessing the power of social media and AI for earthquake disaster mitigation. View details
    Unprecedented Insights into Maternal Sleep: A Large-scale Longitudinal Analysis of Real-world Wearable Device Data Before, During, and After Pregnancy
    Nichole Young-Lin
    Conor Heneghan
    Logan Schneider
    Logan Niehaus
    Ariel Haney
    Karla Gleichauf
    Jacqueline Shreibati
    Belen Lafon
    Lancet eBioMedicine (2025)
    Preview abstract Introduction: Current understanding of pregnancy and postpartum sleep is driven by limited lab or self-reported data. Consumer wearable devices may help reveal longitudinal, real-world sleep patterns. Methods: We analyzed de-identified wearable device data from 2,540 users in the United States and Canada who met strict wear-time requirements (≥80% daily usage for ≥80% of the time periods of interest [12 weeks prepregnancy, throughout pregnancy, and 20 weeks immediately postpartum]). We tracked sleep time and staging using Fitbit devices. Results: Compared to prepregnancy, total sleep time (TST) increased from an average of 425.3±43.5 min to a peak of 447.6±47.6 min at gestational week 10 with ongoing declines throughout pregnancy. Time in bed (TIB) followed a similar pattern. Increased light sleep drove the initial TST rise. Deep and REM sleep decreased significantly throughout pregnancy, with maximum reductions of 19.2±13.8 min (p<0.01) and 9.0±19.2 min (p<0.01) respectively by pregnancy end. Sleep efficiency also declined slightly during pregnancy (median drop from 88.3% to 86.8%). After delivery, TIB remained below the prepregnancy baseline by 14.7±45.7 min at one year postpartum and 15.2±47.7 min at 1.5 years postpartum. Conclusion: This unprecedented look at large-scale, real-world sleep and pregnancy patterns revealed a previously unquantified initial increase in sleep followed by decreases in both quantity and quality as pregnancy progresses. Sleep deficits persist for at least 1.5 years postpartum. These quantified trends can assist clinicians and patients in understanding what to expect. View details
    IM-DD vs. Coherent in Datacenters: A Revisit in 2025
    Optical Fiber Communication (OFC) Conference 2025 (2025)
    Preview abstract This tutorial examines the progress and scaling limitations of IM-DD based optical technologies and explores how datacenter use cases optimized coherent technology, including a newly proposed polarization-folding, time-diversity approach and a novel single-sideband coherent detection technology—can address some of these challenges View details
    Google's Approach for Secure AI Agents
    Santiago (Sal) Díaz
    Kara Olive
    Google (2025)
    Preview abstract As part of Google's ongoing efforts to define best practices for secure AI systems, we’re sharing our aspirational framework for secure AI agents. We advocate for a hybrid, defense-in-depth strategy that combines the strengths of traditional, deterministic security controls with dynamic, reasoning-based defenses. This approach is grounded in three core principles: agents must have well-defined human controllers, their powers must be carefully limited, and their actions and planning must be observable. This paper reflects our current thinking and the direction of our efforts as we work towards ensuring that AI agents can be powerful, useful, and secure by default. View details
    Preview abstract The peer-review process is broken and the problem is getting worse, especially in AI: large conferences like NeurIPS increasingly struggle to adequately review huge numbers of paper submissions. I propose a scalable solution that, foremost, recognizes reviewing as important, necessary, \emph{work} and rewards it with crypto-coins owned and managed by the conferences themselves. The idea is at its core quite simple: paper submissions require work (reviews, meta-reviews, etc.) to be done, and therefore the submitter must pay for that work. Each reviewer submits their review to be approved by some designated conference officer (e.g. PC chair, Area Chair, etc.), and upon approval is paid a single coin for a single review. If three reviews are required, the cost of submission should be three coins + a tax that covers payments to all the volunteers who organize the conference. After some one-time startup costs to fairly distribute coins, the process should be relatively stable with new coins minted only when a conference grows. View details
    Towards AI-assisted academic writing
    Malcolm Kane
    Madeleine Grunde-McLaughlin
    Ian Lang
    Proceedings of the 1st Workshop on AI and Scientific Discovery: Directions and Opportunities, Association for Computational Linguistics (2025), pp. 31-45
    Preview abstract We present components of an AI-assisted academic writing system including citation recommendation and introduction writing. The system recommends citations by considering the user’s current document context to provide relevant suggestions. It generates introductions in a structured fashion, situating the contributions of the research relative to prior work. We demonstrate the effectiveness of the components through quantitative evaluations. Finally, the paper presents qualitative research exploring how researchers incorporate citations into their writing workflows. Our findings indicate that there is demand for precise AI-assisted writing systems and simple, effective methods for meeting those needs. View details
    Preview abstract The need for characterizing global variability of atmospheric carbon dioxide (CO2) is quickly increasing, with a growing urgency for tracking greenhouse gasses with sufficient resolution, precision and accuracy so as to support independent verification of CO2 fluxes at local to global scales. The current generation of space-based sensors, however, can only provide sparse observations in space and/or in time, by design. While upcoming missions may address some of these challenges, most are still years away from launch. This challenge has fueled interest in the potential use of data from existing missions originally developed for other applications for inferring global greenhouse gas variability. The Advanced Baseline Imager (ABI) onboard the Geostationary Operational Environmental Satellite (GOES-East), operational since 2017, provides full coverage of much of the western hemisphere at 10-minute intervals from geostationary orbit at 16 wavelengths. We leverage this high temporal resolution by developing a single-pixel, fully-connected neural network to estimate dry-air column CO2 mole fractions (XCO2). The model employs a time series of GOES-East's 16 spectral bands, which aids in disentangling atmospheric CO2 from surface reflectance, alongside ECMWF ERA5 lower tropospheric meteorology, solar angles, and day of year. Training used collocated GOES-East and OCO-2/OCO-3 observations (2017-2020, within 5 km and 10 minutes), with validation and testing performed on 2021 data. The model successfully captures monthly latitudinal XCO2 gradients and shows reasonable agreement with ground-based TCCON measurements. Furthermore, we demonstrate the model's ability to detect elevated XCO2 signals from high-emitting power plants, particularly over low-reflectance surfaces. We also confirm that removing bands 5 (1.6 µm) and 16 (13.3 µm) substantially decreases performance, indicating that the model is able to extract useful information from these bands. Although GOES-East derived XCO2 precision may not rival dedicated instruments, its unprecedented combination of contiguous geographic coverage, 10-minute temporal frequency, and multi-year record offers the potential to observe aspects of atmospheric CO2 variability currently unseen from space, with further potential through spatio-temporal aggregation. View details
    Simulation-Based Inference: A Practical Guide
    Michael Deistler
    Jan Boelts
    Peter Steinbach
    Guy Moss
    Thomas Moreau
    Manuel Gloeckler
    Pedro L. C. Rodriguez
    Julia Linhart
    Janne K. Lappalainen
    Benjamin Kurt Miller
    Pedro J. Goncalves
    Cornelius Schröder
    Jakob H. Macke
    arXiv (2025)
    Preview abstract A central challenge in many areas of science and engineering is to identify model parameters that are consistent with empirical data and prior knowledge. Bayesian inference offers a principled framework for this task, but can be computationally prohibitive when models are defined by stochastic simulators. Simulation-Based Inference (SBI) provides a suite of methods to overcome this limitation and has enabled scientific discoveries in fields such as particle physics, astrophysics and neuroscience. The core idea of SBI is to train neural networks on data generated by a simulator, without requiring access to likelihood evaluations. Once trained, the neural network can rapidly perform inference on empirical observations without requiring additional optimization or simulations. In this tutorial, we provide a practical guide for practitioners aiming to apply SBI methods. We outline a structured SBI workflow and offer practical guidelines and diagnostic tools for every stage of the process--from setting up the simulator and prior, choosing the SBI method and neural network architecture, training the inference model, to validating results and interpreting the inferred parameters. We illustrate these steps through examples from astrophysics, psychophysics, and neuroscience. This tutorial empowers researchers to apply state-of-the-art SBI methods, facilitating efficient parameter inference for scientific discovery. View details
    Consideration on CMAS arriving as discrete particles
    Eric H. Jordan
    Stephen Jordan
    Hiram Diaz
    Byung-gun Jun
    (2025)
    Preview abstract Turbine contaminants known as CMAS mostly arrive as individual particles in a range of mineral compositions to turbine hot sections where they are deposited and within a small area can be treated as arriving at random locations as splats. By the time the particles reach the hot section the particle size maximum is believed to be 10 microns. Using a simplified heat transfer analysis suggests the arriving temperature will be the turbine inlet temperature. Using AFRL03 as a representative set of possible minerals, for most turbine inlet temperatures a mixture of melted and un-melted particles will arrive. There are 31 combinations of the 5 minerals of AFRL03 presenting a wide range of melting points experimentally investigated in this paper. As expected, combinations generally melt at lower temperatures than the highest melting mineral in each combination. The progression of conditions starting with the arrival of isolated individual minerals is modeled using monte carlo simulations and known materials from percolation theory. This allows understanding of the development of coverage fraction and potential for mineral mixing important to melt behavior as a function of normalized CMAS dose. Using the normalized CMAS dose it is also possible to comment on the likely relative fraction of coating life during which less than fully homogenized CMAS dominates behavior. It is noteworthy that 4 out of 5 minerals and 4 mineral combinations lack either calcium or silicon or both and also melt below 1300°C. Interaction in the early deposition stage involves non CMAS like chemistries. View details
    Instability of steady-state mixed-state symmetry-protected topological order to strong-to-weak spontaneous symmetry breaking
    Jeet Shah
    Christopher Fechisin
    Yuxin Wang
    Joseph T. Iosue
    James D. Watson
    Yan-Qi Wang
    Brayden Ware
    Cheng-Ju Lin
    Alexey Gorshkov
    Quantum (2025)
    Preview abstract Recent experimental progress in controlling open quantum systems enables the pursuit of mixed- state nonequilibrium quantum phases. We investigate whether open quantum systems hosting mixed-state symmetry-protected topological states as steady states retain this property under sym- metric perturbations. Focusing on the decohered cluster state—a mixed-state symmetry-protected topological state protected by a combined strong and weak symmetry—we construct a parent Lind- bladian that hosts it as a steady state. This Lindbladian can be mapped onto exactly solvable reaction-diffusion dynamics, even in the presence of certain perturbations, allowing us to solve the parent Lindbladian in detail and reveal previously-unknown steady states. Using both analytical and numerical methods, we find that typical symmetric perturbations cause strong-to-weak sponta- neous symmetry breaking at arbitrarily small perturbations, destabilize the steady-state mixed-state symmetry-protected topological order. However, when perturbations introduce only weak symmetry defects, the steady-state mixed-state symmetry-protected topological order remains stable. Addi- tionally, we construct a quantum channel which replicates the essential physics of the Lindbladian and can be efficiently simulated using only Clifford gates, Pauli measurements, and feedback. View details
    CURIE: Evaluating LLMs on multitask long context scientific understanding and reasoning
    Matthew Abraham
    Haining Pan
    Zahra Shamsi
    Muqthar Mohammad
    Chenfei Jiang
    Ruth Alcantara
    Gowoon Cheon
    Xuejian Ma
    Michael Statt
    Jackson Cui
    Nayantara Mudur
    Eun-Ah Kim
    Paul Raccuglia
    Victor V. Albert
    Lizzie Dorfman
    Brian Rohr
    Shutong Li
    Maria Tikhanovskaya
    Drew Purves
    Elise Kleeman
    Philippe Faist
    Ean Phing VanLee
    International Conference on Learning Representations (ICLR) (2025)
    Preview abstract The core of the scientific problem-solving process involves synthesizing information while applying expert knowledge. Large Language Models (LLMs) have the potential to accelerate this process due to their extensive knowledge across a variety of domains. Recent advancements have also made it possible for LLMs to handle very long "in-context" content. However, existing evaluations of long-context LLMs have focused on assessing their ability to summarize or retrieve information within the given context, primarily in generalist tasks that do not require deep scientific expertise. To facilitate analogous assessments of domain-specific tasks, we introduce the scientific long-Context Understanding and Reasoning Inference Evaluations (CURIE) benchmark. This benchmark provides a set of 8 challenging tasks, derived from around 250 scientific research papers, requiring domain expertise, comprehension of long in-context information, and multi-step reasoning that tests the ability of LLMs to assist scientists in realistic workflows. Tasks in CURIE have been collected from experts in six disciplines - materials science, theoretical condensed matter physics, quantum computing, geospatial analysis, biodiversity, and protein sequencing - covering both experimental and theoretical workflows in science. We evaluate a range of closed and open LLMs on these tasks. Additionally, we propose strategies for task decomposition, which allow for a more nuanced evaluation of the models and facilitate staged multi-step assessments. We hope that insights gained from CURIE can guide the future development of LLMs. View details
    Confinement in a Z2 lattice gauge theory on a quantum computer
    Julius Mildenberger
    Wojtek Mruczkiewicz
    Jad C. Halimeh
    Philipp Hauke
    Nature Physics (2025)
    Preview abstract Gauge theories describe the fundamental forces in the standard model of particle physics and play an important role in condensed-matter physics. The constituents of gauge theories, for example, charged matter and electric gauge field, are governed by local gauge constraints, which lead to key phenomena such as the confinement of particles that are not fully understood. In this context, quantum simulators may address questions that are challenging for classical methods. Although engineering gauge constraints is highly demanding, recent advances in quantum computing are beginning to enable digital quantum simulations of gauge theories. Here we simulate confinement dynamics in a Z2 lattice gauge theory on a superconducting quantum processor. Tuning a term that couples only to the electric field produces confinement of charges, a manifestation of the tight bond that the gauge constraint generates between both. Moreover, we show how a modification of the gauge constraint from Z2 towards U(1) symmetry freezes the system dynamics. Our work illustrates the restriction that the underlying gauge constraint imposes on the dynamics of a lattice gauge theory, showcases how gauge constraints can be modified and protected, and promotes the study of other models governed by multibody interactions. View details
    ×