Publications

Our teams aspire to make discoveries that impact everyone, and core to our approach is sharing our research and tools to fuel progress in the field.

people standing in front of a screen with images and a chipboard

Our teams aspire to make discoveries that impact everyone, and core to our approach is sharing our research and tools to fuel progress in the field.

Sort By
  • Title
  • Title, descending
  • Year
  • Year, descending
1 - 15 of 11258 publications
    Beyond Vector Similarity: Hierarchical Context-Aware Graph RAG vs Standard RAG in Enterprise Code Migration
    Suddhasatwa Bhaumik
    Nilesh Jaiswal
    Arjit Shukla
    Divya Malhotra
    Aniket Agrawal
    Saurabh Garg
    Suchit Puri
    Google Cloud India, Google, S. No, AP81, 83, N Main Rd, near Hard Rock Cafe, Koregaon Park Annexe, Mundhwa, Pune, Maharashtra 411036 (2026)
    Preview abstract As enterprises modernize legacy systems (e.g., monolithic Java architectures to Python microservices), Large Language Models (LLMs) have become instrumental in automated code translation. However, traditional vector-based Retrieval-Augmented Generation (Standard RAG) struggles with topological relationships, fetching isolated text chunks that frequently sever inheritance chains and lead to high compilation failure rates. This paper presents a comparative analysis between Standard RAG and a novel Hierarchical Context-Resident Graph (HCRG) methodology. Our pipeline utilizes tree-sitter for polyglot Abstract Syntax Tree (AST) extraction, mapping architectural edges into a Google Cloud Spanner Property Graph, and serializing this structure into a Gemini (on Vertex AI) Context Cache to enable topological, parent-first code translation. By shifting evaluation from naive text-overlap to a custom 7-metric framework measuring Software Engineering (SE) utility, empirical evaluations on the spring-petclinic-genai repository demonstrate significant structural improvements. Graph RAG decisively mitigates dependency loss, dropping the API hallucination rate from 56.4% to 16.2%. Furthermore, it improves Dependency Resolution Quality (DRQ) from 34.8% to 65.9% and enhances Parent-Child Consistency (PCC) from 26.7% to 45.5%. Interestingly, traditional lexical metrics fail to capture this divergence; both methodologies achieved an identical 91% average CodeBLEU score, effectively masking Standard RAG’s structural failures behind syntactically plausible but broken code. However, the results indicate that Graph RAG is not strictly superior across all dimensions. Providing the LLM with dense, global structural context introduces new vulnerabilities: Graph RAG suffers a severe degradation in Cyclomatic Complexity Consistency (dropping from Standard RAG’s 71.6% to 46.7%) due to defensive over-engineering by the LLM, alongside a slight drop in Docstring Preservation (67.0% down to 61.0%) caused by prompt attention dilution. Ultimately, this research validates that while Graph RAG trades an increase in code complexity for critical reductions in API hallucinations, it offers a substantially more viable and architecturally sound path for automated enterprise codebase modernisation. View details
    Who Controls the Curriculum for AI? The Limits of Participatory Design for Educational AI
    Michael Madaio
    Learning Under Algorithmic Conditions, University of Minnesota Press (2026)
    Preview abstract Participatory design is a long-standing effort to shift control over technology design from technologists to users and communities impacted by technologies. For educational AI, this means involving students, families, teachers, and other stakeholders in shaping the design of AI systems. While promising, in this article, I situate the recent calls for participatory design of educational AI systems within a different historical tradition—that of contests over local control of educational curricula. I argue that approaches that attempt to steer the design and development of educational AI through participatory methods may inadvertently reproduce the history of political contestation of educational curricula, in ways that may privilege the most powerful communities, rather than those inequitably impacted. What might it look like to treat participatory AI design as a site for political contestation? How might these approaches avoid reproducing the same majoritarian tendencies that led to educational inequities in the first place? View details
    Preview abstract Responsive user interfaces enable dynamically adjusting user interfaces based on device-specific aspects such as screen size, aspect ratio, display resolution, etc. However, traditional responsive design fails to account for different types of constraints of a user and task criticality of the task being performed via the UI. Misalignment between the UI design, user context and task criticality can lead to user error. This disclosure describes techniques, implemented with user permission, for dynamically modifying the layout, information density, and/or interactive physics of a user interface based on a dual-factor analysis of user cognitive state and task criticality. The user's cognitive state can be inferred from behavioral telematics. Task criticality can be inferred from semantic analysis. The information density and other parameters of a user interface are automatically adjusted based on such analyses. Such adjustments include applying or relaxing restrictions on interactivity and adjusting visual prominence of various UI elements to adjust the information density of the user interface. The adjustments can also include adjusting friction as appropriate, hiding certain aspects of the user interface, or other types of adjustments. View details
    Managing and Securing Google's Fleet of Multi-Node Servers
    Richard Hanley
    Havard Skinnemoen
    Andrés Lagar-Cavilla
    Michael Wong
    Jeff Andersen
    Kishan Prasad
    Patrick Leis
    Shiva Rao
    Chris Koch
    Jad Baydoun
    Anna Sapek
    Communications of the ACM, 69:3 (2026), pp. 82 - 92
    Preview abstract Server hardware and software co-design for a secure, efficient cloud. View details
    An Empirical Study of Tablet Ergonomics: The Interplay of Temperature, Orientation, and Use Behaviors
    Carmen Van Ommen
    Mikki Phan
    Arun Raghupathy
    Daniel Huynh
    Barbara Chaparro
    Ergonomics in Design: The Quarterly of Human Factors Applications (2026)
    Preview abstract To balance computational performance with thermal comfort, this study explores a consolidated hotspot architecture at the top center of a tablet. We tested hotspot (39°C, 43°C, 45°C, 47°C) and ambient temperatures (25°C, 35°C) with 60 participants, measuring perception, action likelihood, and expectation. The hotspot was observed away from high contact areas, with 43°C identified as the threshold for significant discomfort. Discomfort increased with portrait mode use and higher device and ambient temperatures, while active use duration influenced acceptability. The findings underscore the importance of thermal mapping and contextual sensing, with direct applications for software throttling thresholds of coated aluminum enclosures. View details
    MoXaRt: Audio-Visual Object-Guided Sound Interaction for XR
    Sieun Kim
    Qianhui Zheng
    Ruoyu Xu
    Ravi Tejasvi
    Anuva Kulkarni
    Junyi Zhu
    2026
    Preview abstract In Extended Reality (XR), complex acoustic environments often overwhelm users, compromising both scene awareness and social engagement due to entangled sound sources. We introduce MoXaRt, a real-time XR system that uses audio-visual cues to separate these sources and enable fine-grained sound interaction. MoXaRt's core is a cascaded architecture that performs coarse, audio-only separation in parallel with visual detection of sources (e.g. faces, instruments). These visual anchors then guide refinement networks to isolate individual sources, separating complex mixes of up to five concurrent sources (e.g. two voices + three instruments) with ca. 2 second processing latency. We validate MoXaRt through a technical evaluation on a new, complex dataset we collected, and a 22-participant user study. Our results demonstrate that MoXaRt significantly improves communication clarity—boosting listening comprehension in noisy conditions by 33.2% (p=0.0058)—and significantly reduces cognitive load (M=7.50 vs. M=3.36, p<0.001), paving the way for more perceptive and socially adept XR experiences. View details
    Preview abstract The rapid adoption of agentic systems powered by large language models (LLMs) introduces significant security challenges distinct from plain conversational models, particularly concerning prompt injection and tool misuse due to their dynamic personas and real- world tool interactions. This paper investigates the effectiveness of hardened security prompting in a task-oriented multi-agent framework, using a coding assistant as a representative case study. We com- pare a baseline ”unhardened” agent against a ”hard- ened” version equipped with explicit security guide- lines applied across all sub-agents. Our evaluation across 150+ single-turn and 32 multi-turn attack sce- narios demonstrates that prompt hardening dramat- ically improves resilience. With a simple, approxi- mately 500-token security hardener, single-turn fail- ure rates dropped from 19.48% to 2.60%, while multi- turn failure rates decreased from 75.00% to 46.88%. Furthermore, we show that successfully bypassing the hardened agent requires significantly more adversar- ial effort and a greater number of chat turns. How- ever, the analysis also reveals a critical shift in vul- nerability taxonomy: as direct attacks fail, adver- saries exploit the agent’s core functionality via ”Func- tional Wrappers” (Intent Obfuscation), highlighting a residual risk that necessitates a shift in the defen- sive paradigm from static filters to dynamic runtime state and intent analysis. View details
    Preview abstract Validating conversational artificial intelligence (AI) for regulated medical software applications may present challenges, as static test datasets and manual review may be limited in identifying emergent, conversational anomalies. A multi-agent AI system may be configured in a closed-loop for automated validation. The system can, for example, utilize an end user persona simulator agent to generate prompts for a target model and a domain /regulatory expert adjudicator agent to evaluate the target model’s responses against a configurable rubric. A meta-analysis agent can analyze anomalies to identify underlying vulnerabilities, which may then be used to programmatically synthesize new adversarial personas. This adaptive process can generate evidence to support regulatory compliance and continuous performance monitoring for medical software algorithms systems. View details
    Preview abstract Enterprise service delivery platforms, while vital for HR operations, create significant challenges in managing the risks of Personally Identifiable Information (PII) exposure. The integration of Generative AI offers new efficiencies but also amplifies these risks. Existing solutions—ranging from manual redaction and rule-based Data Loss Prevention (DLP) to inflexible data masking—fail to provide a nuanced, integrated approach. This paper introduces the Dual-Mode Privacy Guard (DMPG), a conceptual framework that establishes a model for Augmented Compliance. The framework provides a "defense-in-depth" strategy built on three pillars: (1) a Zero-Trust AI Foundation leveraging a verifiable, non-retention API gateway to ensure data privacy; (2) a proactive "Guardrail" that uses AI to detect and flag potential PII for human-in-the-loop review; and (3) an on-demand "Tool" that allows users to create securely anonymized data assets. By differentiating between proactive monitoring and reactive utility, the DMPG shifts the compliance paradigm from a manual burden to an AI-assisted process that enhances, rather than replaces, human oversight. This paper details the framework’s platform-agnostic architecture, using Salesforce as a reference implementation, and argues for its novelty as a model for operationalizing privacy principles within modern enterprise systems. View details
    From Correctness to Collaboration: A Human-Centered Taxonomy of AI Agent Behavior in Software Engineering
    Sherry Y. Shi
    Extended Abstracts of the 2026 CHI Conference on Human Factors in Computing Systems (CHI EA ’26), ACM, New York, NY, USA (2026)
    Preview abstract The ongoing transition of Large Language Models in software engineering from code generators into autonomous agents requires a shift in how we define and measure success. While models are becoming more capable, the industry lacks a clear understanding of the behavioral norms that make an agent effective in collaborative software development in the enterprise. This work addresses this gap by presenting a taxonomy of desirable agent behaviors, synthesized from 91 sets of user-defined rules for coding agents. We identify four core expectations: Adhere to Standards and Processes, Ensure Code Quality and Reliability, Solve Problems Effectively, and Collaborate with the User. These findings offer a concrete vocabulary for agent behavior, enabling researchers to move beyond correctness-only benchmarks and design evaluations that reflect the realities of professional software development in large enterprises. View details
    Preview abstract Some artificial intelligence provisioning models that function as tools for human users or rely on labor arbitrage can present challenges for organizations, such as managing personnel rather than task outcomes and introducing data security risks. An architecture is described for an outcome-based synthetic labor market in which autonomous computational agents can be compensated based on verified task completion. The framework can leverage trusted execution environments to create secure hardware enclaves for processing sensitive data, which can render the data cryptographically inaccessible to a host system or agent provider. This approach can facilitate a secure, transactional market for autonomous professional execution, which may enable a shift from managing labor resources to procuring verified outcomes from a pool of specialized agents. View details
    Preview abstract The major mobile platforms, Android and iOS, have introduced changes that restrict user tracking to improve user privacy, yet apps continue to covertly track users via device fingerprinting. We study the opportunity to improve this dynamic with a case study on mobile fingerprinting that evaluates developers’ perceptions of how well platforms protect user privacy and how developers perceive platform privacy interventions. Specifically, we study developers’ willingness to make changes to protect users from fingerprinting and how developers consider trade-offs between user privacy and developer effort. We do this via a survey of 246 Android developers, presented with a hypothetical Android change that protects users from fingerprinting at the cost of additional developer effort. We find developers overwhelmingly (89%) support this change, even when they anticipate significant effort, yet prefer the change be optional versus required. Surprisingly, developers who use fingerprinting are six times more likely to support the change, despite being most impacted by it. We also find developers are most concerned about compliance and enforcement. In addition, our results show that while most rank iOS above Android for protecting user privacy, this distinction significantly reduces among developers very familiar with fingerprinting. Thus there is an important opportunity for platforms and developers to collaboratively build privacy protections, and we present actionable ways platforms can facilitate this. View details
    Exponential quantum advantage in processing massive classical data
    Haimeng Zhao
    Alexander Zlokapa
    John Preskill
    Hsin-Yuan (Robert) Huang
    arXiv:2604.07639 (2026)
    Preview abstract Broadly applicable quantum advantage, particularly in classical data processing and machine learning, has been a fundamental open problem. In this work, we prove that a small quantum computer of polylogarithmic size can perform large-scale classification and dimension reduction on massive classical data by processing samples on the fly, whereas any classical machine achieving the same prediction performance requires exponentially larger size. Furthermore, classical machines that are exponentially larger yet below the required size need superpolynomially more samples and time. We validate these quantum advantages in real-world applications, including single-cell RNA sequencing and movie review sentiment analysis, demonstrating four to six orders of magnitude reduction in size with fewer than 60 logical qubits. These quantum advantages are enabled by quantum oracle sketching, an algorithm for accessing the classical world in quantum superposition using only random classical data samples. Combined with classical shadows, our algorithm circumvents the data loading and readout bottleneck to construct succinct classical models from massive classical data, a task provably impossible for any classical machine that is not exponentially larger than the quantum machine. These quantum advantages persist even when classical machines are granted unlimited time or if BPP=BQP, and rely only on the correctness of quantum mechanics. Together, our results establish machine learning on classical data as a broad and natural domain of quantum advantage and a fundamental test of quantum mechanics at the complexity frontier. View details
    Preview abstract This talk addresses the challenges of operating Google's monitoring systems at scale, handling terabytes of telemetry data and preventing overload from diverse workloads. We'll explore how Google's internal client library and Monarch, its planet-scale time-series database, work together for cost-effective data collection. Key principles include a distributed push model, dynamic client-side data reduction, centralized retention, and periodic metric analysis. The session will then bridge these concepts to the open-source world, discussing our work with OpenTelemetry's OpAMP protocol to achieve similar scalable and efficient telemetry collection. Attendees will gain insights into adapting these principles for cost savings and learn about our collaboration with the OpAMP SIG to benefit the broader community. View details
    CrossCheck: Input Validation for WAN Control Systems
    Rishabh Iyer
    Isaac Keslassy
    Sylvia Ratnasamy
    Networked Systems Design and Implementation (NSDI) (2026) (to appear)
    Preview abstract We present CrossCheck, a system that validates inputs to the Software-Defined Networking (SDN) controller in a Wide Area Network (WAN). By detecting incorrect inputs—often stemming from bugs in the SDN control infrastructure—CrossCheck alerts operators before they trigger network outages. Our analysis at a large-scale WAN operator identifies invalid inputs as a leading cause of major outages, and we show how CrossCheck would have prevented those incidents. We deployed CrossCheck as a shadow validation system for four weeks in a production WAN, during which it accurately detected the single incident of invalid inputs that occurred while sustaining a 0% false positive rate under normal operation, hence imposing little additional burden on operators. In addition, we show through simulation that CrossCheck reliably detects a wide range of invalid inputs (e.g., detecting demand perturbations as small as 5% with 100% accuracy) and maintains a near-zero false positive rate for realistic levels of noisy, missing, or buggy telemetry data (e.g., sustaining zero false positives with up to 30% of corrupted telemetry data). View details

    Follow us

    ×