Publications
Our teams aspire to make discoveries that impact everyone, and core to our approach is sharing our research and tools to fuel progress in the field.

Our teams aspire to make discoveries that impact everyone, and core to our approach is sharing our research and tools to fuel progress in the field.
Sort By
1 - 15 of 10499 publications
InstructPipe: Generating Visual Blocks Pipelines with Human Instructions and LLMs
Jing Jin
Xiuxiu Yuan
Jun Jiang
Jingtao Zhou
Yiyi Huang
Zheng Xu
Kristen Wright
Jason Mayes
Mark Sherwood
Johnny Lee
Alex Olwal
Ram Iyengar
Na Li
Proceedings of the 2025 CHI Conference on Human Factors in Computing Systems (CHI), ACM, pp. 23
Preview abstract
Visual programming has the potential of providing novice programmers with a low-code experience to build customized processing pipelines. Existing systems typically require users to build pipelines from scratch, implying that novice users are expected to set up and link appropriate nodes from a blank workspace. In this paper, we introduce InstructPipe, an AI assistant for prototyping machine learning (ML) pipelines with text instructions. We contribute two large language model (LLM) modules and a code interpreter as part of our framework. The LLM modules generate pseudocode for a target pipeline, and the interpreter renders the pipeline in the node-graph editor for further human-AI collaboration. Both technical and user evaluation (N=16) shows that InstructPipe empowers users to streamline their ML pipeline workflow, reduce their learning curve, and leverage open-ended commands to spark innovative ideas.
View details
Wave: Offloading Resource Management to SmartNIC Cores
Jack Humphries
Neel Natu
Kostis Kaffes
Hank Levy
Christos Kozyrakis
2025
Preview abstract
SmartNICs are increasingly deployed in datacenters to offload tasks from server CPUs, improving the efficiency and flexibility of datacenter security, networking and storage. Optimizing cloud server efficiency in this way is critically important to ensure that virtually all server resources are available to paying customers. Userspace system software, specifically, decision-making tasks performed by various operating system subsystems, is particularly well suited for execution on mid-tier SmartNIC ARM cores. To this end, we introduce Wave, a framework for offloading userspace system software to processes/agents running on the SmartNIC. Wave uses Linux userspace systems to better align system functionality with SmartNIC capabilities. It also introduces a new host-SmartNIC communication API that enables offloading of even μs-scale system software. To evaluate Wave, we offloaded preexisting userspace system software including kernel thread scheduling, memory management, and an RPC stack to SmartNIC ARM cores, which showed a performance degradation of 1.1%-7.4% in an apples-to-apples comparison with on-host implementations. Wave recovered host resources consumed by on-host system software for memory management (saving 16 host cores), RPCs (saving 8 host cores), and virtual machines (an 11.2% performance improvement). Wave highlights the potential for rethinking system software placement in modern datacenters, unlocking new opportunities for efficiency and scalability.
View details
Preview abstract
We introduce sum-of-squares spectral amplification (SOSSA), a framework for improving quantum simulation algorithms relevant to low-energy problems. SOSSA first represents the Hamiltonian as a sum-of-squares and then applies spectral amplification to amplify the low-energy spectrum. The sum-of-squares representation can be obtained using semidefinite programming. We show that SOSSA can improve the efficiency of traditional methods in several simulation tasks involving low-energy states. Specifically, we provide fast quantum algorithms for energy and phase estimation that improve over the state-of-the-art in both query and gate complexities, complementing recent results on fast time evolution of low-energy states. To further illustrate the power of SOSSA, we apply it to the Sachdev-Ye-Kitaev model, a representative strongly correlated system, where we demonstrate asymptotic speedups by a factor of the square root of the system size. Notably, SOSSA was recently used in [G.H. Low \textit{et al.}, arXiv:2502.15882 (2025)] to achieve state-of-art costs for phase estimation of real-world quantum chemistry systems.
View details
Our Approach to Protecting AI Training Data
Cindy Muya
Jason Novak
Cindee Madison
Reiner Critides
Ben Kamber
Niha Vempati
Jeremy Wiesner
, Google, 1600 Amphitheatre Parkway, Mountain View, CA, 94043 (2025)
Preview abstract
Google has over 25 years experience protecting data from inappropriate access and unauthorized use. In the era of AI, Google has extended these best practices in data protection to ensure that the right data is used the right way to train models. This paper presents a number of these best practices, describes how Google applies them in its systems, and describes how Google Cloud customers can use Google Cloud capabilities to implement these practices themselves.
Protecting data requires both technical controls to enable safe data use at scale, and governance processes to ensure that companies have visibility and control over how their data is used. This fundamentally requires: understanding data and ensuring it has sufficient metadata in the form of attributes, controlling the data and implementing policies to allow (or disallow) certain usage based on those attributes, transforming data to enable its usage in policy compliant ways, and human oversight and governance.
Protecting data in AI inherits these requirements and introduces new requirements to account for unique AI-specific risks including memorization/recitation and the costs of training foundational models. Meeting these new risks requires new capabilities including enhanced understanding of data and model lineage as well as an increased ability to control data usage through checks on data for policy compliance at the time a training job is configured before it is run.
This white paper offers an in-depth look at data protection best practices and Google’s data protection capabilities, and is one of a series of publications about Google's Secure AI Framework (SAIF). Building upon its secure development practices, Google has developed and deployed a number of capabilities to understand, control, and transform data in its infrastructure so that data is both protected and used appropriately. This involves robust annotation systems to represent metadata and enable granular understanding of data at both an item and dataset level, policy engines that evaluate machine readable policies on that data using the metadata attributes, and sensors to understand how data is flowing across Google’s systems and raise alerts when policy violations occur. Moreover, Google has developed de-identification and anonymization systems to transform data to make it policy compliant and safer to use for AI training.
View details
Triaging mammography with artificial intelligence: an implementation study
Sarah M. Friedewald
Sunny Jansen
Fereshteh Mahvar
Timo Kohlberger
David V. Schacht
Sonya Bhole
Dipti Gupta
Scott Mayer McKinney
Stacey Caron
David Melnick
Mozziyar Etemadi
Samantha Winter
Alejandra Maciel
Luca Speroni
Martha Sevenich
Arnav Agharwal
Rubin Zhang
Gavin Duggan
Shiro Kadowaki
Atilla Kiraly
Jie Yang
Basil Mustafa
Krish Eswaran
Shravya Shetty
Breast Cancer Research and Treatment (2025)
Preview abstract
Purpose
Many breast centers are unable to provide immediate results at the time of screening mammography which results in delayed patient care. Implementing artificial intelligence (AI) could identify patients who may have breast cancer and accelerate the time to diagnostic imaging and biopsy diagnosis.
Methods
In this prospective randomized, unblinded, controlled implementation study we enrolled 1000 screening participants between March 2021 and May 2022. The experimental group used an AI system to prioritize a subset of cases for same-visit radiologist evaluation, and same-visit diagnostic workup if necessary. The control group followed the standard of care. The primary operational endpoints were time to additional imaging (TA) and time to biopsy diagnosis (TB).
Results
The final cohort included 463 experimental and 392 control participants. The one-sided Mann-Whitney U test was employed for analysis of TA and TB. In the control group, the TA was 25.6 days [95% CI 22.0–29.9] and TB was 55.9 days [95% CI 45.5–69.6]. In comparison, the experimental group's mean TA was reduced by 25% (6.4 fewer days [one-sided 95% CI > 0.3], p<0.001) and mean TB was reduced by 30% (16.8 fewer days; 95% CI > 5.1], p=0.003). The time reduction was more pronounced for AI-prioritized participants in the experimental group. All participants eventually diagnosed with breast cancer were prioritized by the AI.
Conclusions
Implementing AI prioritization can accelerate care timelines for patients requiring additional workup, while maintaining the efficiency of delayed interpretation for most participants. Reducing diagnostic delays could contribute to improved patient adherence, decreased anxiety and addressing disparities in access to timely care.
View details
Steering Self-Evaluation: Interpreting LLM’s Reasoning Across Domains and Languages
Praveen Hegde
2025
Preview abstract
Understanding and controlling the reasoning processes of large language models (LLMs) is crucial for their reliable deployment. In this work, we investigate the latent representation of self-evaluation behavior - the ability of a model to assess its own reasoning steps - a vital behavior for robust reasoning. Through targeted steering vector computation, we identify a direction within LLM activations that represents this self-evaluation behavior. Crucially, we demonstrate that this steering vector for self-evaluation exhibits remarkable cross-contextual efficacy, working well across different domains (e.g., math and medicine) and languages (e.g., English and Spanish). This suggests that the identified latent direction captures a fundamental, abstract representation of self-evaluation within the LLM's internal state, offering a promising avenue for interpretable and controllable reasoning across diverse applications.
View details
Security Assurance in the Age of Generative AI
Tom Grzelak
Kara Olive
Moni Pande
Google, Google, 1600 Amphitheatre Parkway, Mountain View, CA, 94043 (2025)
Preview abstract
Artificial Intelligence (AI) is a rapidly growing field known for experimentation and quick iteration, qualities that can pose challenges for traditional enterprise security approaches. Because AI introduces unique assets and surfaces—AI-driven applications, agents, assistants, vast training datasets, the models themselves, and supporting infrastructure—we’re continually updating our security controls, guided by Google’s Secure AI Framework (SAIF).
To address the new challenges, we’ve expanded our traditional security approaches to cover the new attack surfaces by scanning for more types of vulnerabilities, analyzing more intel, preparing to respond to new kinds of incidents, and continually testing our controls in novel ways to strengthen our security posture.
This white paper is one of a series describing our approaches to implementing Google’s SAIF. In this paper we explain how we’re applying security assurance—a cross functional effort aiming to achieve high confidence that our security features, practices, procedures, controls, and architecture accurately mediate and enforce our security policies—to AI development. Security assurance efforts help to both ensure the continued security of our AI products and address relevant policy requirements.
Just as quality assurance (QA) in manufacturing meticulously examines finished products and the processes that create them to ensure they meet quality standards, security assurance serves a complementary role to the broader security efforts within an organization. Those broader security efforts span the design, implementation, and operation of controls to create secure software products; security assurance focuses on verifying and improving those efforts. Security assurance identifies gaps, weaknesses, and areas where controls may not be operating as intended, to drive continuous improvement across all security domains. It’s two-party review in action—security assurance helps build confidence that the software was not just built securely, but continues to run securely.
Since AI systems—those that use AI models for reasoning—present a combination of well understood and novel risks, AI technologies require a combination of both common and novel controls. No matter how strong these controls are, a security assurance program is essential to ensure they are working as intended and that they are continually updated and improved.
The paper opens with an overview of security assurance functions, covering several teams and capabilities that work together to ensure security controls are working across any software development lifecycle, including the AI development lifecycle. In particular, we focus on four functions—Red Teaming, Vulnerability Management, Detection & Response, and Threat Intelligence, and how those work together to address issues through Remediation.
We then describe the features specific to AI that affect assurance functions and give examples of how we’re adapting our approaches to account for AI-specific technologies and risks. We also include guidance for organizations considering creating their own AI assurance programs, including best practices for assuring training data, models, the AI software supply chain, and product integrations.
We intend this paper to be useful for a broad technical audience, including both assurance specialists who are new to AI technologies, and AI developers who are new to assurance practices.
View details
Preview abstract
We present a scalable and agile approach for ads image content moderation at Google, addressing the challenges of moderating massive volumes of ads with diverse content and evolving policies. The proposed method utilizes human-curated textual descriptions and cross-modal text-image co-embeddings to enable zero-shot classification of policy violating ads images, bypassing the need for extensive supervised training data and human labeling. By leveraging large language models (LLMs) and user expertise, the system generates and refines a comprehensive set of textual descriptions representing policy guidelines. During inference, co-embedding similarity between incoming images and the textual descriptions serves as a reliable signal for policy violation detection, enabling efficient and adaptable ads content moderation. Evaluation results demonstrate the efficacy of this framework in significantly boosting the detection of policy violating content.
View details
ExfilState: Automated Discovery of Timer-Free Cache Side Channels on ARM CPUs
Preview
Fabian Thomas
Michael Torres
Michael Schwarz
ACM Conference on Computer and Communications Security (CCS) (2025) (to appear)
Online Bidding under RoS Constraints without Knowing the Value
Sushant Vijayan
Swati Padmanabhan
The Web Conference (2025)
Preview abstract
We consider the problem of auto-bidding in online advertising from the perspective of a single advertiser. The goal of the advertiser is to maximize their value under the Return-on-Spend (RoS) constraint, with performance measured in terms of \emph{regret} against the optimal offline solution that knows all queries a priori. Importantly, the value of the item is \textit{unknown} to the bidder ahead of time. The goal of the bidder is to quickly identify the optimal bid, while simultaneously satisfying budget and RoS constraints. Using a simple UCB-style algorithm, we provide the first result which achieves optimal regret and constraint violation for this problem.
View details
DORA Impact of Generative AI in Software Development
Derek DeBellis
Daniella Villalba
DORA, Google (2025)
Preview abstract
Generative AI is transforming how software is built, offering unprecedented opportunities and raising new challenges. Based on extensive research and developer interviews, this DORA report provides a nuanced understanding of AI's impact on individuals, teams, and organizations.
View details
Fast Tensor Completion via Approximate Richardson Iteration
Mehrdad Ghadiri
Yunbum Kook
Ali Jadbabaie
Proceedings of the 42nd International Conference on Machine Learning (2025)
Preview abstract
We study tensor completion (TC) through the lens of low-rank tensor decomposition (TD). Many TD algorithms use fast alternating minimization methods, which solve highly structured linear regression problems at each step (e.g., for CP, Tucker, and tensor-train decompositions). However, such algebraic structure is lost in TC regression problems, making direct extensions unclear. To address this, we propose a lifting approach that approximately solves TC regression problems using structured TD regression algorithms as blackbox subroutines, enabling sublinear-time methods. We theoretically analyze the convergence rate of our approximate Richardson iteration based algorithm, and we demonstrate on real-world tensors that its running time can be 100x faster than direct methods for CP completion.
View details
Speculative RAG: Enhancing Retrieval Augmented Generation through Drafting
Zilong Wang
Steven Zheng
Swaroop Mishra
Yuwei Zhang
Anush Mattapalli
Ankur Taly
Jingbo Shang
ICLR 2025
Preview abstract
Retrieval augmented generation (RAG) has attracted a lot of attention across both academia and industry due to its capability in inserting timely and accurate evidence to the generation by large language models. However, the introduction of retrieved evidence largely makes the input prompt longer, which would harm the understanding quality of large language models and make it slower in actual usage scenarios. To solve these issues, we propose SpeculativeRAG, which leverages a smaller LLM to conduct the retrieval augmented generation for a larger LLM. The smaller LLM can digest a few pieces of evidence and generate multiple pieces of drafts in parallel rapidly, and these drafts will be verified by a large LLM to guarantee the quality. We achieve a higher speed as well as a better quality in the RAG results.
View details
Deep Researcher with Test-time Diffusion
Guan Sun
Zoey CuiZhu
Yuanjun (Sophia) Bi
Weiming Wen
Hui Wan
Chunfeng Wen
Solène Maître
George Lee
Vishy Tirumalashetty
Emily Xue
Burak Gokturk
2025
Preview abstract
Deep research agents, powered by Large Language Models (LLMs), are rapidly advancing; yet, their performance often plateaus when generating complex, long-form research reports using generic test-time scaling algorithms. Drawing inspiration from the iterative nature of human research, which involves cycles of searching, reasoning, and revision, we propose the Test-Time Diffusion Deep Researcher (TTD-DR). This novel framework conceptualizes research report generation as a diffusion process. TTD-DR initiates this process with a preliminary draft, an updatable skeleton that serves as an evolving foundation to guide the research direction. The draft is then iteratively refined through a "denoising" process, which is dynamically informed by a retrieval mechanism that incorporates external information at each step. The core process is further enhanced by a self-evolutionary algorithm applied to each component of the agentic workflow, ensuring the generation of high-quality context for the diffusion process. This draft-centric design guides the report writing process to be more timely and coherent while reducing information loss during the iterative search process. We demonstrate that our TTD-DR achieves state-of-the-art results on a wide array of benchmarks that require intensive search and multi-hop reasoning, significantly outperforming existing deep research agents.
View details
StreetReaderAI: Making Street View Accessible Using Context-Aware Multimodal AI
Alex Fiannaca
Nimer Jaber
Victor Tsaran
Proceedings of the 2025 ACM Symposium on User Interface Software and Technology (UIST'25) (to appear)
Preview abstract
Interactive streetscape mapping tools such as Google Street View (GSV) and Meta Mapillary enable users to virtually navigate and experience real-world environments via immersive 360° imagery but remain fundamentally inaccessible to blind users. We introduce StreetReaderAI, the first-ever accessible street view tool, which combines context-aware, multimodal AI, accessible navigation controls, and conversational speech. With StreetReaderAI, blind users can virtually examine destinations, engage in open-world exploration, or virtually tour any of the over 220 billion images and 100+ countries where GSV is deployed. We iteratively designed StreetReaderAI with a mixed-visual ability team and performed an evaluation with eleven blind users. Our findings demonstrate the value of an accessible street view in supporting POI investigations and remote route planning. We close by enumerating key guidelines for future work.
View details