Publications

Our teams aspire to make discoveries that impact everyone, and core to our approach is sharing our research and tools to fuel progress in the field.

people standing in front of a screen with images and a chipboard

Our teams aspire to make discoveries that impact everyone, and core to our approach is sharing our research and tools to fuel progress in the field.

Sort By
  • Title
  • Title, descending
  • Year
  • Year, descending
1 - 15 of 11300 publications
    Improved Differentially Private Algorithms for Rank Aggregation
    Phanu Vajanopath
    Quentin Hillebrand
    Vorapong Suppakitpaisarn
    AAAI (2026)
    Preview abstract Rank aggregation is a task of combining the rankings of items from multiple users into a single ranking that best represents the users' rankings. Alabi et al. (AAAI'22) presents differentially-private (DP) polynomial-time approximation schemes (PTASes) and 5-approximation algorithms with certain additive errors for the Kemeny rank aggregation problem in both central and local models. In this paper, we present improved DP PTASes with smaller additive error in the central model. Furthermore, we are first to study the footrule rank aggregation problem under DP. We give a near-optimal algorithm for this problem; as a corollary, this leads to 2-approximation algorithms with the same additive error as the 5-approximation algorithms of Alabi et al. for the Kemeny rank aggregation problem in both central and local models. View details
    Beyond Vector Similarity: Hierarchical Context-Aware Graph RAG vs Standard RAG in Enterprise Code Migration
    Suddhasatwa Bhaumik
    Nilesh Jaiswal
    Arjit Shukla
    Divya Malhotra
    Aniket Agrawal
    Saurabh Garg
    Suchit Puri
    Google Cloud India, Google, S. No, AP81, 83, N Main Rd, near Hard Rock Cafe, Koregaon Park Annexe, Mundhwa, Pune, Maharashtra 411036 (2026)
    Preview abstract As enterprises modernize legacy systems (e.g., monolithic Java architectures to Python microservices), Large Language Models (LLMs) have become instrumental in automated code translation. However, traditional vector-based Retrieval-Augmented Generation (Standard RAG) struggles with topological relationships, fetching isolated text chunks that frequently sever inheritance chains and lead to high compilation failure rates. This paper presents a comparative analysis between Standard RAG and a novel Hierarchical Context-Resident Graph (HCRG) methodology. Our pipeline utilizes tree-sitter for polyglot Abstract Syntax Tree (AST) extraction, mapping architectural edges into a Google Cloud Spanner Property Graph, and serializing this structure into a Gemini (on Vertex AI) Context Cache to enable topological, parent-first code translation. By shifting evaluation from naive text-overlap to a custom 7-metric framework measuring Software Engineering (SE) utility, empirical evaluations on the spring-petclinic-genai repository demonstrate significant structural improvements. Graph RAG decisively mitigates dependency loss, dropping the API hallucination rate from 56.4% to 16.2%. Furthermore, it improves Dependency Resolution Quality (DRQ) from 34.8% to 65.9% and enhances Parent-Child Consistency (PCC) from 26.7% to 45.5%. Interestingly, traditional lexical metrics fail to capture this divergence; both methodologies achieved an identical 91% average CodeBLEU score, effectively masking Standard RAG’s structural failures behind syntactically plausible but broken code. However, the results indicate that Graph RAG is not strictly superior across all dimensions. Providing the LLM with dense, global structural context introduces new vulnerabilities: Graph RAG suffers a severe degradation in Cyclomatic Complexity Consistency (dropping from Standard RAG’s 71.6% to 46.7%) due to defensive over-engineering by the LLM, alongside a slight drop in Docstring Preservation (67.0% down to 61.0%) caused by prompt attention dilution. Ultimately, this research validates that while Graph RAG trades an increase in code complexity for critical reductions in API hallucinations, it offers a substantially more viable and architecturally sound path for automated enterprise codebase modernisation. View details
    Preview abstract The rapid adoption of agentic systems powered by large language models (LLMs) introduces significant security challenges distinct from plain conversational models, particularly concerning prompt injection and tool misuse due to their dynamic personas and real- world tool interactions. This paper investigates the effectiveness of hardened security prompting in a task-oriented multi-agent framework, using a coding assistant as a representative case study. We com- pare a baseline ”unhardened” agent against a ”hard- ened” version equipped with explicit security guide- lines applied across all sub-agents. Our evaluation across 150+ single-turn and 32 multi-turn attack sce- narios demonstrates that prompt hardening dramat- ically improves resilience. With a simple, approxi- mately 500-token security hardener, single-turn fail- ure rates dropped from 19.48% to 2.60%, while multi- turn failure rates decreased from 75.00% to 46.88%. Furthermore, we show that successfully bypassing the hardened agent requires significantly more adversar- ial effort and a greater number of chat turns. How- ever, the analysis also reveals a critical shift in vul- nerability taxonomy: as direct attacks fail, adver- saries exploit the agent’s core functionality via ”Func- tional Wrappers” (Intent Obfuscation), highlighting a residual risk that necessitates a shift in the defen- sive paradigm from static filters to dynamic runtime state and intent analysis. View details
    Compact Conformal Subgraphs
    Kamesh Munagala
    Aravindan Vijayaraghavan
    ICML (2026)
    Preview abstract Conformal prediction provides rigorous uncertainty guarantees for model outputs but can produce prohibitively large prediction sets in structured domains such as routing, planning, or sequential recommendation. We introduce \emph{graph-based conformal compression}, a framework for constructing compact subgraphs that preserve the statistical validity of conformal prediction while reducing structural complexity. We study a formulation that selects a smallest subgraph capturing a prescribed fraction of conditional probability mass, and reduce to a weighted version of densest $k$-subgraphs in hypergraphs, in the regime where the subgraph has a large fraction of edges. We design efficient approximation algorithms that achieve constant factor coverage and size trade-offs. Our results highlight an algorithmic regime, distinct from classical densest-$k$-subgraph hardness settings, where the problem can be approximated efficiently, bridging conformal prediction with combinatorial graph compression. We finally validate our algorithmic approach on synthetic and real-world instances of trip planning and navigation, showing in each case that our approach handily beats natural baselines. View details
    Preview abstract The advent of 3D Gaussian Splatting has revolutionized graphics rendering by offering high visual quality and fast rendering speed. However, training large-scale scenes at high quality remains challenging due to the substantial memory demands required to store Gaussians and optimizer states. To address these limitations, we propose GS-Offload, fast and memory-efficient training system for 3D Gaussian Splatting. GS-Offload stores Gaussians and optimizer states in host memory and selectively transfer only the necessary data to GPU memory on demand, significantly reducing GPU memory usage. With carefully designed software pipelining and CPU-side optimizer acceleration, GS-Offload achieves training speed near that of GPU-only setups, while significantly lowering GPU memory demands. View details
    Preview abstract Managing compiler build errors that can arise during infrastructure upgrades in large, polyglot codebases may be challenging, as manual remediation can be slow and some automated tools may not support modern language syntax. A system can provide automated error remediation by ingesting compiler diagnostics and analyzing source code using an Abstract Syntax Tree (AST). A recursive scope resolution algorithm, for example, can traverse the AST to identify a specific and narrowly-scoped code block at which to apply an error suppression. Conversely, this algorithmic complexity can be bypassed when lexical scope resolution is not required, and the system can identify the specific location of error suppressions directly from the error's exact coordinates. The system may then generate and apply language-specific patches, such as structured comments for JavaScript source files or line-scoped comments for TypeScript source files, for example, by using a transactional rewrite engine. This approach can provide a scalable method for managing automated code remediation, which may facilitate infrastructure upgrades by reducing the need for manual intervention. View details
    Preview abstract The emergence of Agentic AI—autonomous systems capable of reasoning, decision-making, and multi-step execution—represents a paradigm shift in enterprise technology. Moving beyond simple generative tasks, these agents offer the potential to solve long-standing industry pain points, with over 90% of enterprises planning integration within the next three years. However, the transition from successful proof-of-concept (PoC) to a resilient, production-grade system presents significant hurdles. This article categorizes these challenges into three primary domains: Technical and Engineering Hurdles: Issues such as "entangled workflows" that complicate debugging, the struggle to maintain output quality and mitigate hallucinations, and the unpredictability caused by shifting underlying models or data sources. People, Process, and Ecosystem Hurdles: The high operational costs and unclear ROI of large models, the necessity of a new "Agent Ops" skillset, the complexity of integrating agents with disparate enterprise systems, and a rapidly evolving regulatory landscape. The Pace of Change and Security risks: The technical debt incurred by shifting software frameworks and the expanded attack surface created by autonomous agents. The article concludes that successful deployment requires a shift from informal "vibe-testing" to rigorous engineering discipline. By adopting code-first frameworks, establishing robust evaluation metrics (KPIs), and prioritizing functional deployment over theoretical optimization, organizations can effectively manage the lifecycle of Agentic AI and realize its transformative business value. View details
    Preview abstract We introduce AASE (Activation-based AI Safety Enforcement), a framework for post-perception safety monitoring in large language models. Unlike pre-perception approaches that analyze input or output text, AASE monitors the model's internal activation patterns—what the model "understands" rather than what text it processes or generates—enabling detection of safety-relevant states before harmful outputs are produced. The framework comprises three techniques: Activation Fingerprinting (AF) for harmful content detection, Agent Action Gating (AAG) for prompt injection defense, and Activation Policy Compliance (APC) for enterprise policy enforcement. We introduce paired contrastive training to isolate safety-relevant signals from confounding factors such as topic and style, addressing signal entanglement in polysemantic activations. Validation across 7 models from 3 architecture families shows strong class separation: Gemma-2-9B achieves AUC 1.00 with 7.2σ separation across all probes; AAG achieves AUC ≥0.88 across all models on the InjecAgent benchmark; APC achieves 0.97-1.00 AUC across three enterprise policies. Model size correlates with probe quality—Gemma-2-9B (7.2σ separation) outperforms Gemma-2-2B (4.3σ). All techniques survive INT4 quantization with minimal separation degradation. AASE is 9× faster than Llama Guard 3 (33ms vs 306ms) with higher TPR (88% vs 50%) at a tunable threshold that trades FPR for detection sensitivity, adding only 0.002ms probe overhead to existing inference. View details
    Preview abstract This writeup defines the Hydration Proxy Pattern, a framework for building stateful conversational data systems over stateless LLM APIs. It describes a platform-agnostic approach to decoupling persistence from the AI provider through secure server-side intermediation and hybrid storage tiers. The abstract provides a blueprint for managing the "Persistence Gap" in enterprise AI integrations, detailing high-level strategies for session history management, streaming, and multi-stage semantic grounding without disclosing specific internal implementation details. View details
    Preview abstract The rapid expansion of the Internet of Things (IoT) and smart home ecosystems has led to a fragmented landscape of user data management across consumer electronics (CE) such as Smart TVs, gaming consoles, and set-top boxes. Current onboarding processes on these devices are characterized by high friction due to manual data entry and opaque data-sharing practices. This paper introduces the User Data Sharing System (UDSS), a platform-agnostic framework designed to facilitate secure, privacy-first PII (Personally Identifiable Information) exchange between device platforms and third-party applications. Our system implements a Contextual Scope Enforcement (CSE) mechanism that programmatically restricts data exposure based on user intent—specifically distinguishing between Sign-In and Sign-Up workflows. Unlike cloud-anchored identity standards such as FIDO2/WebAuthn, UDSS is designed for shared, device-centric CE environments where persistent user-to-device bind-ing cannot be assumed. We further propose a tiered access model that balances developer needs with regulatory compliance (GDPR/CCPA). A proof-of-concept implementation on a reference ARMv8 Linux-based middleware demonstrates that UDSS reduces user onboarding latency by 65% and measurably reduces PII over-exposure risk through protocol-enforced data minimization. This framework provides a standardized approach to identity management in the heterogeneous CE market. View details
    Taming the Variants Multi-Architecture Continuous Testing at Google
    Sushmita Azad
    Chandrakanth Chittappa
    Ali Esmaeeli
    Laura Macaddino
    Sam Manfreda
    David Margolin
    Dharma Naidu
    Sabuj Pattanayek
    Sachin Sable
    Ruslan Sakevych
    Dushyant Acharya
    Adrian Berding
    Kevin Crossan
    Wolff Dobson
    Abhay Singh
    19th IEEE International Conference on Software Testing, Verification and Validation (ICST) 2026, Daejeon, Republic of Korea, IEEE
    Preview abstract Enterprises are increasingly adopting multiple general-purpose computer architectures in the data center. This leads to new testing challenges as it creates demand to qualify the software for the additional architectures. Naively double-testing all software for both architectures is costly and unnecessary. Further, reconfiguring CI/CD to take advantage of the new architecture can be non-trivial at scale. This paper introduces CI/CD variants and an optimized testing cycle to solve these twin challenges. We empirically evaluate our solution's impact on human and machine expenses using 44k projects at Google on real production data. First, we estimate saving ~25% of machine expenses at the negligible cost of a few delayed breakage detections per day. Second, we estimate a 90+% reduction in human cost for migrating the configuration. All features described in this paper are now Generally Available at Google and we report this as an empirical case study in scaling CI/CD to new architectures. View details
    Phoenix: Rowhammer Attacks on DDR5 with Self-Correcting Synchronization
    Michele Marazzi
    Kaveh Razavi
    Salman Qazi
    Diego Meyer
    Patrick Jattke
    IEEE Security & Privacy (S&P) (2026)
    Preview abstract In modern Kubernetes environments, eBPF (Extended Berkeley Packet Filter) has become the de facto standard for high-performance dataplane enforcement. However, this architecture introduces a complex distributed state problem: the asynchronous synchronization between the Kubernetes control plane (Intent) and the kernel-space BPF maps (Reality). A critical failure mode, termed “Silent Divergence,” occurs when the control plane believes a network policy or identity is applied, but the underlying kernel state is missing or corrupted. In this “Gray Failure” state, standard observability tools—including logs, liveness probes, and agent status checks—report health, while the network silently drops traffic. This paper introduces eBPF-Auditor, a specialized consistency verification framework. Unlike standard agents that rely on event-based reconciliation, eBPF-Auditor performs a periodic “Two-Way State Audit” that mathematically verifies the intersection of Kubernetes Intent and BPF Reality. We demonstrate through fault injection and benchmarks on 5,000 pods that this approach successfully detects state drift with 100% accuracy and negligible sub-millisecond overhead (ms), making it a viable solution for high-frequency runtime verification in production hyperscale clusters. View details
    ARM MTE Performance in Practice
    Taehyun Noh
    Yingchen Wang
    Tal Garfinkel
    Mahesh Madhav
    Mattan Erez
    Shravan Narayan
    Usenix Security (2026)
    Neural general circulation models for modeling precipitation
    Stephan Hoyer
    Dmitrii Kochkov
    Janni Yuval
    Ian Langmore
    Science Advances (2026)
    Preview abstract Climate models struggle to accurately simulate precipitation, particularly extremes and the diurnal cycle. While hybrid models combining machine learning and physics have emerged with the premise of improving precipitation simulations, none have proven sufficiently skillful or stable enough to outperform existing models in simulating precipitation. Here, we present the first hybrid model that is trained directly on precipitation observations. The model runs at 2.8 degrees resolution and is built on the differentiable NeuralGCM framework. This model is stable for decadal simulations and demonstrates significant improvements over existing GCMs, ERA5 reanalysis, and a Global Cloud-Resolving Model in simulating precipitation. Our approach yields reduced biases, a more realistic precipitation distribution, improved representation of extremes, and a more accurate diurnal cycle. Furthermore, it outperforms the ECMWF ensemble for mid-range weather forecasting. This advance paves the way for more reliable simulations of current climate and for the ability to fully utilize the abundance of existing observations to further improve GCMs. View details
    ×