Publications
Our teams aspire to make discoveries that impact everyone, and core to our approach is sharing our research and tools to fuel progress in the field.

Our teams aspire to make discoveries that impact everyone, and core to our approach is sharing our research and tools to fuel progress in the field.
Sort By
1 - 15 of 10464 publications
Leveraging Per-Example Privacy for Machine Unlearning
Nazanin Mohammadi Sepahvand
Anvith Thudi
Ashmita Bhattacharyya
Nicolas Papernot
Eleni Triantafillou
Daniel M. Roy
Karolina Dziugaite
International Conference on Machine Learning (ICML) (2025)
Preview abstract
This work focuses on developing fine-grained theoretical insights to quantify unlearning difficulty at the level of individual data points for fine-tuning-based unlearning. Unlike other unlearning methods that lack theoretical guarantees for non-convex models, our approach builds on recent advances in differential privacy to provide per-instance guarantees using Rényi divergence. While our theoretical analysis applies to Langevin dynamics, we empirically demonstrate that the derived guarantees—and their trends—continue to hold for fine-tuning, even in the absence of explicit noise. Our results show that per-instance privacy levels computed from training dynamics reliably predict unlearning difficulty, offering a principled and practical way to assess unlearning performance. Furthermore, our method identifies harder-to-unlearn data more effectively than existing heuristics, providing a more precise tool for guiding unlearning strategies. These findings pave the way for adaptive and efficient unlearning methods tailored to the properties of specific data points.
View details
Zoom in, Zoom out, Reframe: Domain Experts’ Strategies for Addressing Non-Experts’ Complex Questions
Beverly Freeman
Roma Ruparel
Proceedings of the 2025 CHI Conference on Human Factors in Computing Systems (CHI) (2025)
Preview abstract
Consumers rely on the Internet for expert information in domains such as healthcare and law. Large Language Models (LLMs) have the potential to increase access to expert knowledge. However, past research has not addressed how to handle certain aspects of complex questions that commonly occur in expert-layperson interactions. We conducted in-depth interviews with 26 experts across multiple domains to understand how they experience and respond to challenges associated with non-experts’ questions. Results from a thematic analysis reveal three recurring strategies that experts across domains employ when fielding complex questions. Experts zoom in to clarify details of a broad information request, zoom out to address overly narrow questions or assumptions, and reframe when the underlying need is unstated or poorly represented. We discuss implications for the design of LLM-based experiences that facilitate access to expert information.
View details
AI and Generative AI Transforming Disaster Management: A Survey of Damage Assessment and Response Techniques
Aman Raj
Shashank Kapoor
IEEE Compsac 2025 (2025)
Preview abstract
Natural disasters, including earthquakes, wildfires and cyclones, bear a huge risk on human lives as well as infrastructure assets. An effective response to disaster depends on the ability to rapidly and efficiently assess the intensity of damage. Artificial Intelligence (AI) and Generative Artificial Intelligence (GenAI) presents a breakthrough solution, capable of combining knowledge from multiple types and sources of data, simulating realistic scenarios of disaster, and identifying emerging trends at a speed previously unimaginable. In this paper, we present a comprehensive review on the prospects of AI and GenAI in damage assessment for various natural disasters, highlighting both its strengths and limitations. We talk about its application to multimodal data such as text, image, video, and audio, and also cover major issues of data privacy, security, and ethical use of the technology during crises. The paper also recognizes the threat of Generative AI misuse, in the form of dissemination of misinformation and for adversarial attacks. Finally, we outline avenues of future research, emphasizing the need for secure, reliable, and ethical Generative AI systems for disaster management in general. We believe that this work represents the first comprehensive survey of Gen-AI techniques being used in the field of Disaster Assessment and Response.
View details
Preview abstract
Unifying query languages is key in reducing toil for app developers and end users to query and analyze observability data. A common query language that can leverage all observability data such as metrics, traces, profiles, events, logs to facilitate correlation, support trend analytics and provide end-to-end observability for AI applications. The Observability TAG QLS workgroup is finalizing a semantic query language spec in 2025 and is recommending SQL as a basis with further experimentation on syntaxes. This talk will explore the design principles, user research and challenges of creating a query language to support observability goals. It will delve into the core concepts, syntax, and semantics of SQL operators and its needed syntactic sugar, while addressing the unique requirements of observability data. It will also explore the trade-offs between simplicity, expressiveness, and performance. This query language convergence for end-to-end analytics could enhance reliability and operational efficiency for SREs and your app developers. A win-win for all.
View details
Preview abstract
Continuous Integration (CI) is an essential software development practice that establishes processes to minimize bugs and errors in production. In a similar vein, experimentation of software products is vital for evaluating user satisfaction, quality, performance and other key business metrics. Experimentation allows product owners to evaluate the user impact of changes. This can help make informed decisions regarding feature launches. Experimentation also allows developers to tweak internal processes and algorithms to maximize the impact of new features and changes. Additionally, it can sometimes detect errors not detected by CI.
Unlike CI systems, experimentation platforms are meant to closely imitate production and usually run the system under test (SUT) against a large scale of input. Despite this, experimentation platforms have a lot in common with CI systems. The mechanisms for continuously integrating and testing changes can be modified and applied to experimentation platforms.
Google Search's experimentation platform started as a command line tool many years ago. Over time, this tool has evolved into a platform that serves the evaluation needs for many of Google's products like Search, Assistant, YouTube, Play, Lens, etc., running thousands of large experiments every day.
In this workshop, we will present the evolution of Google Search's experimentation platform and how it was transformed from a simple CLI tool into a platform that works at scale, fulfills continuous experimentation needs and provides many CI-like functionalities to its users.
View details
Generative AI for medical education: Insights from a case study with medical students and an AI tutor for clinical reasoning
Amy Wang
Roma Ruparel
Paul Jhun
Julie Anne Seguin
Patricia Strachan
Renee Wong
2025
Preview abstract
Generative Artificial Intelligence (AI), particularly Large Language Models (LLMs), have demonstrated significant potential in clinical reasoning skills such as history-taking and differential diagnosis generation—critical aspects of medical education. This work explores how LLMs can augment medical curricula through interactive learning. We conducted a participatory design process with medical students, residents and medical education experts to co-create an AI-powered tutor prototype for clinical reasoning. As part of the co-design process, we conducted a qualitative user study, investigating learning needs and practices via interviews, and conducting concept evaluations through interactions with the prototype. Findings highlight the challenges learners face in transitioning from theoretical knowledge to practical application, and how an AI tutor can provide personalized practice and feedback. We conclude with design considerations, emphasizing the importance of context-specific knowledge and emulating positive preceptor traits, to guide the development of AI tools for medical education.
View details
Avoid global outages by partitioning cloud applications to reduce blast radius
Karan Anand
https://cloud.google.com/ (2025)
Preview abstract
Cloud application development faces the inherent challenge of balancing rapid innovation with high availability. This blog post details how Google Workspace's Site Reliability Engineering team addresses this conflict by implementing vertical partitioning of serving stacks. By isolating application servers and storage into distinct partitions, the "blast radius" of code changes and updates is significantly reduced, minimizing the risk of global outages. This approach, which complements canary deployments, enhances service availability, provides flexibility for experimentation, and facilitates data localization. While challenges such as data model complexities and inter-service partition misalignment exist, the benefits of improved reliability and controlled deployments make partitioning a crucial strategy for maintaining robust cloud applications
View details
Preview abstract
We present a scalable and agile approach for ads image content moderation at Google, addressing the challenges of moderating massive volumes of ads with diverse content and evolving policies. The proposed method utilizes human-curated textual descriptions and cross-modal text-image co-embeddings to enable zero-shot classification of policy violating ads images, bypassing the need for extensive supervised training data and human labeling. By leveraging large language models (LLMs) and user expertise, the system generates and refines a comprehensive set of textual descriptions representing policy guidelines. During inference, co-embedding similarity between incoming images and the textual descriptions serves as a reliable signal for policy violation detection, enabling efficient and adaptable ads content moderation. Evaluation results demonstrate the efficacy of this framework in significantly boosting the detection of policy violating content.
View details
A Recipe for Improving Remote Sensing Zero Shot Generalization
Aviad Barzilai
Yotam Gigi
Vered Silverman
Yehonathan Refael
Bolous Jaber
Amr Helmy
3rd ML4RS Workshop at ICLR 2025
Preview abstract
Foundation models have had a significant impact across various AI applications, enabling applications for use cases that were previously impossible. Visual language models (VLMs), in particular, have outperformed other techniques in many tasks. In remote sensing (RS), foundation models have shown improvements across various applications. However, unlike other fields, the use of VLMs with large-scale remote sensing image-text datasets remains limited.
In this work, we first introduce two novel image-caption datasets for training of remote sensing foundation models. The first dataset pairs aerial and satellite imagery, aligned with Google-Maps data, with high-quality captions generated using Gemini. The second utilizes public web images and their corresponding alt-text, filtered for only remote sensing domain, resulting in a highly diverse dataset.
We show that using these datasets to pre-train the Mammut [], a VLM architecture, results in state-of-the-art generalization performance in a zero-shot classification and cross-modal retrieval on well-known public benchmarks. Secondly, we leverage this newly pre-trained VLM to generate inference attention maps for a novel class query (i.e., a class unseen during training). We subsequently propose an iterative self-supervised fine-tuning approach where samples aligned with these attention maps are iteratively pseudo-labeled and utilized for model training.
View details
ZAPBench: A Benchmark for Whole-Brain Activity Prediction in Zebrafish
Alexander Immer
Alex Bo-Yuan Chen
Mariela D. Petkova
Nirmala A. Iyer
Luuk Willem Hesselink
Aparna Dev
Gudrun Ihrke
Woohyun Park
Alyson Petruncio
Aubrey Weigel
Wyatt Korff
Florian Engert
Jeff W. Lichtman
Misha B. Ahrens
International Conference on Learning Representations (ICLR) (2025)
Preview abstract
Data-driven benchmarks have led to significant progress in key scientific modeling domains including weather and structural biology. Here, we present the Zebrafish Activity Prediction Benchmark (ZAPBench), which quantitatively measures progress on the problem of predicting cellular-resolution neural activity throughout an entire vertebrate brain. The benchmark is based on a novel dataset containing 4d light-sheet microscopy recordings of more than 70,000 neurons in a larval zebrafish brain, along with motion stabilized and voxel-level cell segmentations of these data that facilitate development of a variety of forecasting methods. Initial results from a selection of time series and volumetric video modeling approaches achieve better performance than naive baseline methods, but also show room for further improvement. The specific brain used in the activity recording is also undergoing synaptic-level anatomical mapping, which will enable future integration of detailed structural information into ZAP forecasting methods.
View details
Improving simulation-based origin-destination demand calibration using sample segment counts data
Arwa Alanqary
Yechen Li
The 12th Triennial Symposium on Transportation Analysis conference (TRISTAN XII), Okinawa, Japan (2025)
Preview abstract
This paper introduces a novel approach to demand estimation that utilizes partial observations of segment-level track counts. Building on established simulation-based demand estimation methods, we present a modified formulation that integrates sample track counts as a regularization term. This approach effectively addresses the underdetermination challenge in demand estimation, moving beyond the conventional reliance on a prior OD matrix. The proposed formulation aims to preserve the distribution of the observed track counts while optimizing the demand to align with observed path-level travel times. We tested this approach on Seattle's highway network with various congestion levels. Our findings reveal significant enhancements in the solution quality, particularly in accurately recovering ground truth demand patterns at both the OD and segment levels.
View details
GOALIE (GOAL oriented IntErventions) Proactive Multimodal Agent to Assist Augmented Reality
Saptarashmi Bandyopadhyay
Vikas Bahirwani
Lavisha Aggarwal
Bhanu Guda
Lin Li
Qin Liu
Tom Goldstein
John Dickerson
Andrea Colaco
2025
Preview abstract
Multimodal AI Agents are helpful to assist and guide users in completing real-time tasks like cooking, robotics, manufacturing. An emerging form of multimodal communication is Augmented Reality (AR), where an AI Agent can enhance user experience with step-by-step guidance of tasks by observing the user's vision and language inputs. Current LLM or VLM based agents are reactive, waiting for an user query before responding. Proactive AI Agents in AR focus on detecting when the AI Agent should autonomously intervene to fix mistakes or followup any instruction. Our GOALIE (GOAL-oriented IntErvention) Agent is the first multimodal proactive AR agent which guides the user step-by-step on its own. We build an innovative Zero-Shot Prompting framework PSoS (Proactive Sequence of Steps) with the context of abstract past user actions, the agent's previous responses, and the user's granular goals and actions before it is detected that the AI Agent should intervene. We use PSoS for Supervised Finetuning (SFT), Direct Preference Optimization (DPO) and Group-Relative Policy Optimization (GRPO) finetuning of our AI agent to improve the quality of the agent's proactive intervention. We also propose a new algorithmic framework, Bagged group Relative Policy Optimization (BRPO), to reduce the variance in rewards of generation groups, to adapt the finetuning algorithm for multimodal proactive interventions by the AI Agent and to enable real-time finetuning of the AI model. We compare the step-by-step intervention quality and efficiency of the GOALIE Agent with Gemma-3 models along with other VLMs for task execution with human expert labels. We conduct human evaluation of the proactive interventions, demonstrating user satisfaction with the GOALIE Agent's proactive interventions. We will release the code, model and human evaluation data.
View details
Fast electronic structure quantum simulation by spectrum amplification
Guang Hao Low
Robbie King
Dominic Berry
Qiushi Han
Albert Eugene DePrince III
Alec White
Rolando Somma
arXiv:2502.15882 (2025)
Preview abstract
The most advanced techniques using fault-tolerant quantum computers to estimate the ground-state energy of a chemical Hamiltonian involve compression of the Coulomb operator through tensor factorizations, enabling efficient block-encodings of the Hamiltonian. A natural challenge of these methods is the degree to which block-encoding costs can be reduced. We address this challenge through the technique of spectrum amplification, which magnifies the spectrum of the low-energy states of Hamiltonians that can be expressed as sums of squares. Spectrum amplification enables estimating ground-state energies with significantly improved cost scaling in the block encoding normalization factor $\Lambda$ to just $\sqrt{2\Lambda E_{\text{gap}}}$, where $E_{\text{gap}} \ll \Lambda$ is the lowest energy of the sum-of-squares Hamiltonian. To achieve this, we show that sum-of-squares representations of the electronic structure Hamiltonian are efficiently computable by a family of classical simulation techniques that approximate the ground-state energy from below. In order to further optimize, we also develop a novel factorization that provides a trade-off between the two leading Coulomb integral factorization schemes-- namely, double factorization and tensor hypercontraction-- that when combined with spectrum amplification yields a factor of 4 to 195 speedup over the state of the art in ground-state energy estimation for models of Iron-Sulfur complexes and a CO$_{2}$-fixation catalyst.
View details
Visualizing Dynamics of Charges and Strings in (2+1)D Lattice Gauge Theories
Tyler Cochran
Bernhard Jobst
Yuri Lensky
Gaurav Gyawali
Norhan Eassa
Melissa Will
Aaron Szasz
Dmitry Abanin
Rajeev Acharya
Laleh Beni
Trond Andersen
Markus Ansmann
Frank Arute
Kunal Arya
Abe Asfaw
Juan Atalaya
Brian Ballard
Alexandre Bourassa
Michael Broughton
David Browne
Brett Buchea
Bob Buckley
Tim Burger
Nicholas Bushnell
Anthony Cabrera
Juan Campero
Hung-Shen Chang
Jimmy Chen
Benjamin Chiaro
Jahan Claes
Agnetta Cleland
Josh Cogan
Roberto Collins
Paul Conner
William Courtney
Alex Crook
Ben Curtin
Sayan Das
Laura De Lorenzo
Agustin Di Paolo
Paul Donohoe
ILYA Drozdov
Andrew Dunsworth
Alec Eickbusch
Aviv Elbag
Mahmoud Elzouka
Vinicius Ferreira
Ebrahim Forati
Austin Fowler
Brooks Foxen
Suhas Ganjam
Robert Gasca
Élie Genois
William Giang
Dar Gilboa
Raja Gosula
Alejo Grajales Dau
Dietrich Graumann
Alex Greene
Steve Habegger
Monica Hansen
Sean Harrington
Paula Heu
Oscar Higgott
Jeremy Hilton
Robert Huang
Ashley Huff
Bill Huggins
Cody Jones
Chaitali Joshi
Pavol Juhas
Hui Kang
Amir Karamlou
Kostyantyn Kechedzhi
Trupti Khaire
Bryce Kobrin
Alexander Korotkov
Fedor Kostritsa
John Mark Kreikebaum
Vlad Kurilovich
Dave Landhuis
Tiano Lange-Dei
Brandon Langley
Kim Ming Lau
Justin Ledford
Kenny Lee
Loick Le Guevel
Wing Li
Alexander Lill
Will Livingston
Daniel Lundahl
Aaron Lunt
Sid Madhuk
Ashley Maloney
Salvatore Mandra
Leigh Martin
Orion Martin
Cameron Maxfield
Seneca Meeks
Anthony Megrant
Reza Molavi
Sebastian Molina
Shirin Montazeri
Ramis Movassagh
Charles Neill
Michael Newman
Murray Ich Nguyen
Chia Ni
Kris Ottosson
Alex Pizzuto
Rebecca Potter
Orion Pritchard
Ganesh Ramachandran
Matt Reagor
David Rhodes
Gabrielle Roberts
Kannan Sankaragomathi
Henry Schurkus
Mike Shearn
Aaron Shorter
Noah Shutty
Vladimir Shvarts
Vlad Sivak
Spencer Small
Clarke Smith
Sofia Springer
George Sterling
Jordan Suchard
Alex Sztein
Doug Thor
Mert Torunbalci
Abeer Vaishnav
Justin Vargas
Sergey Vdovichev
Guifre Vidal
Steven Waltman
Shannon Wang
Brayden Ware
Kristi Wong
Cheng Xing
Jamie Yao
Ping Yeh
Bicheng Ying
Juhwan Yoo
Grayson Young
Yaxing Zhang
Ningfeng Zhu
Yu Chen
Vadim Smelyanskiy
Adam Gammon-Smith
Frank Pollmann
Michael Knap
Nature, 642 (2025), 315–320
Preview abstract
Lattice gauge theories (LGTs) can be used to understand a wide range of phenomena, from elementary particle scattering in high-energy physics to effective descriptions of many-body interactions in materials. Studying dynamical properties of emergent phases can be challenging, as it requires solving many-body problems that are generally beyond perturbative limits. Here we investigate the dynamics of local excitations in a LGT using a two-dimensional lattice of superconducting qubits. We first construct a simple variational circuit that prepares low-energy states that have a large overlap with the ground state; then we create charge excitations with local gates and simulate their quantum dynamics by means of a discretized time evolution. As the electric field coupling constant is increased, our measurements show signatures of transitioning from deconfined to confined dynamics. For confined excitations, the electric field induces a tension in the string connecting them. Our method allows us to experimentally image string dynamics in a (2+1)D LGT, from which we uncover two distinct regimes inside the confining phase: for weak confinement, the string fluctuates strongly in the transverse direction, whereas for strong confinement, transverse fluctuations are effectively frozen. We also demonstrate a resonance condition at which dynamical string breaking is facilitated. Our LGT implementation on a quantum processor presents a new set of techniques for investigating emergent excitations and string dynamics.
View details
Enhancing Remote Sensing Representations through Mixed-Modality Masked Autoencoding
Ori Linial
Yochai Blau
Nadav Sherman
Yotam Gigi
Wojciech Sirko
Proceedings of the Winter Conference on Applications of Computer Vision (WACV) Workshops (2025), pp. 507-516
Preview abstract
This paper presents an innovative approach to pre-training models for remote sensing by integrating optical and radar data from Sentinel-2 and Sentinel-1 satellites. Using a novel variation on the masked autoencoder (MAE) framework, our model incorporates a dual-task setup: reconstructing masked Sentinel-2 images and predicting corresponding Sentinel-1 images. This multi-task design enables the encoder to capture both spectral and structural features across diverse environmental conditions. Additionally, we introduce a "mixing" strategy in the pretraining phase, combining patches from both image sources, which mitigates spatial misalignment errors and enhances model robustness. Evaluation on segmentation and classification tasks, including Sen1Floods11 and BigEarthNet, demonstrates significant improvements in adaptability and generalizability across varied downstream remote sensing applications. Our findings highlight the advantages of leveraging complementary modalities for more resilient and versatile land cover analysis.
View details