Publications
Our teams aspire to make discoveries that impact everyone, and core to our approach is sharing our research and tools to fuel progress in the field.
Our teams aspire to make discoveries that impact everyone, and core to our approach is sharing our research and tools to fuel progress in the field.
Sort By
1 - 15 of 11197 publications
System for a Secure, Outcome-Based Synthetic Labor Market Using Trusted Execution Environments
Patent (2026)
Preview abstract
Some artificial intelligence provisioning models that function as tools for human users or rely on labor arbitrage can present challenges for organizations, such as managing personnel rather than task outcomes and introducing data security risks. An architecture is described for an outcome-based synthetic labor market in which autonomous computational agents can be compensated based on verified task completion. The framework can leverage trusted execution environments to create secure hardware enclaves for processing sensitive data, which can render the data cryptographically inaccessible to a host system or agent provider. This approach can facilitate a secure, transactional market for autonomous professional execution, which may enable a shift from managing labor resources to procuring verified outcomes from a pool of specialized agents.
View details
Preview abstract
This framework manages AI agents by establishing behavioral boundaries and a persistent identity. It uses a multi-layered stack, combining safety rules with brand guidelines, to shape an agent's reasoning. Features include authority decay to limit power if confidence drops and memory segmentation to prevent data tampering. Centralized oversight ensures these digital representatives remain aligned with company policies through continuous monitoring and testing.
View details
Usability Hasn’t Peaked: Exploring How Expressive Design Overcomes the Usability Plateau
Alyssa Sheehan
Bianca Gallardo
Ying Wang
Proceedings of the 2026 CHI Conference on Human Factors in Computing Systems (CHI ’26), April 13–17, 2026, Barcelona, Spain (2026)
Preview abstract
Critics have argued that mobile usability has largely been optimized, and that only incremental gains are possible. We set out to explore if the newest generation of design systems, which promote greater flexibility and a return to design basics, could produce substantially more usable designs while maintaining or increasing aesthetic judgments. Through a study with 48 diverse participants completing tasks in 10 different applications, we found that in designs created following Material 3 Expressive guidelines, users fixated on the correct screen element for a task 33% faster, completed tasks 20% faster, and rated experiences more positively compared to versions designed using the previous Material design system. These improvements in performance and aesthetic ratings challenge the premise of a usability plateau and show that mobile usability has not peaked. We illustrate specific opportunities to make mobile experiences more usable by returning to design fundamentals while highlighting risks of added flexibility.
View details
Preview abstract
In today's AI landscape, success depends not just on prompting large language models but on orchestrating them into intelligent systems that are scalable, compliant, and cost-effective. GenAI on Google Cloud is your hands-on guide to bridging that gap. Whether you're an ML engineer or an enterprise leader, this book offers a practical game plan for taking agentic systems from prototype to production.
Written by practitioners with deep experience in AgentOps, data engineering, and GenAI infrastructure, this guide takes you through real-world workflows from data prep and deployment to orchestration and integration. With concrete examples, field-tested frameworks, and honest insights, you'll learn how to build agentic systems that deliver measurable business value.
> Bridge the production gap that stalls 90% of vertical AI initiatives using systematic deployment frameworks
> Navigate AgentOps complexities through practical guidance on orchestration, evaluation, and responsible AI practices
> Build robust multimodal systems for text, images, and video using proven agent architectures
> Optimize for scale with strategies for cost management, performance tuning, and production monitoring
View details
An Empirical Study of Tablet Ergonomics: The Interplay of Temperature, Orientation, and Use Behaviors
Carmen Van Ommen
Mikki Phan
Arun Raghupathy
Daniel Huynh
Barbara Chaparro
Ergonomics in Design: The Quarterly of Human Factors Applications (2026)
Preview abstract
To balance computational performance with thermal comfort, this study explores a consolidated hotspot architecture at the top center of a tablet. We tested hotspot (39°C, 43°C, 45°C, 47°C) and ambient temperatures (25°C, 35°C) with 60 participants, measuring perception, action likelihood, and expectation. The hotspot was observed away from high contact areas, with 43°C identified as the threshold for significant discomfort. Discomfort increased with portrait mode use and higher device and ambient temperatures, while active use duration influenced acceptability. The findings underscore the importance of thermal mapping and contextual sensing, with direct applications for software throttling thresholds of coated aluminum enclosures.
View details
Preview abstract
When managing complex, unpredictable (non-deterministic) AI agents using simple, fixed control systems (like finite state machines), operational failures and accountability issues often arise. This document introduces a probabilistic governance and telemetry framework to resolve these problems. Instead of following a rigid sequence of steps, this framework defines a multi-dimensional operational boundary, a 'behavioral volume', and assigns the agent a goal. This allows the agent to use its own reasoning to achieve the goal while remaining within the defined boundaries. A separate telemetry layer monitors the agent's actions by calculating metrics, such as alignment scores and drift velocity, to measure how much the agent deviates from its intended behavior. This system provides a method for guiding, monitoring, and securing autonomous agents, effectively managing the performance and security of an unpredictable AI workforce in complex environments.
View details
Preview abstract
The advent of 3D Gaussian Splatting has revolutionized graphics rendering by offering high visual quality and fast rendering speed. However, training large-scale scenes at high quality remains challenging due to the substantial memory demands required to store Gaussians and optimizer states. To address these limitations, we propose GS-Offload, fast and memory-efficient training system for 3D Gaussian Splatting. GS-Offload stores Gaussians and optimizer states in host memory and selectively transfer only the necessary data to GPU memory on demand, significantly reducing GPU memory usage. With carefully designed software pipelining and CPU-side optimizer acceleration, GS-Offload achieves training speed near that of GPU-only setups, while significantly lowering GPU memory demands.
View details
Preview abstract
Source-to-source compilers may perform inefficiently by executing transpilation passes on scripts that do not contain the specific language features a pass is designed to transform, potentially leading to redundant processing. A compiler can analyze a script to generate a per-script feature map, for example, by identifying language features in its abstract syntax tree (AST). Before executing a transpilation pass, the compiler can check this map and may bypass the pass for that script if the specific feature targeted by the pass is not present. This feature map can also be dynamically updated throughout the compilation process as other passes transform the code. This method of conditional pass execution based on content-aware analysis may reduce redundant AST traversals, which could decrease overall compilation time and computational resource consumption.
View details
Preview abstract
Managing compiler build errors that can arise during infrastructure upgrades in large, polyglot codebases may be challenging, as manual remediation can be slow and some automated tools may not support modern language syntax. A system can provide automated error remediation by ingesting compiler diagnostics and analyzing source code using an Abstract Syntax Tree (AST). A recursive scope resolution algorithm, for example, can traverse the AST to identify a specific and narrowly-scoped code block at which to apply an error suppression. Conversely, this algorithmic complexity can be bypassed when lexical scope resolution is not required, and the system can identify the specific location of error suppressions directly from the error's exact coordinates. The system may then generate and apply language-specific patches, such as structured comments for JavaScript source files or line-scoped comments for TypeScript source files, for example, by using a transactional rewrite engine. This approach can provide a scalable method for managing automated code remediation, which may facilitate infrastructure upgrades by reducing the need for manual intervention.
View details
Preview abstract
Automating AI research differs from general software engineering due to computationally expensive evaluation (e.g., model training) and opaque performance attribution. Current LLM-based agents struggle here, often generating monolithic scripts that ignore execution costs and causal factors. We introduce MARS (Modular Agent with Reflective Search), a framework optimized for autonomous AI research. MARS relies on three pillars: (1) Budget-Aware Planning via cost-constrained Monte Carlo Tree Search (MCTS) to explicitly balance performance with execution expense; (2) Modular Construction, employing a "Design-Decompose-Implement" pipeline to manage complex research repositories; and (3) Comparative Reflective Memory, which addresses credit assignment by analyzing solution differences to distill high-signal insights. MARS achieves state-of-the-art performance among open-source frameworks on MLE-Bench under comparable settings, maintaining competitiveness with the global leaderboard's top methods. Furthermore, the system exhibits qualitative "Aha!" moments, where 63% of all utilized lessons originate from cross-branch transfer, demonstrating that the agent effectively generalizes insights across search paths.
View details
Silicon-Level Sovereignty: Root of Trust in AI Accelerators (Digital Trust & Policy)
https://www.dotmagazine.online (2026)
Preview abstract
As artificial intelligence (AI) transitions from experimental pilot programs to mission-critical enterprise operations, traditional software-based security frameworks are proving insufficient against sophisticated infrastructure-level threats. This article introduces the concept of Silicon-Level Sovereignty, a first-principles approach to digital trust that anchors security in the physical hardware rather than the software stack.
We examine the technical architecture of Hardware Root of Trust (RoT), specifically focusing on the roles of Trusted Platform Modules (TPMs) and Secure Enclaves in modern AI accelerators such as GPUs and TPUs. By leveraging cryptographic remote attestation, organizations can move from a model of assumed software integrity to one of verifiable hardware-level proof.
The discussion provides a comparative analysis of industry-leading implementations, including NVIDIA’s Hopper architecture [1, 2], Google’s Titan-backed TPU v5p [3, 4], and Microsoft’s Azure Boost Cerberus system [5, 6], alongside the cluster-scale trust challenges presented by ultra-large systems like xAI’s Colossus [7].
The article concludes that Silicon-Level Sovereignty is no longer an optional security feature but a foundational requirement for establishing the integrity, privacy, and multi-tenant isolation necessary for high-stakes AI workloads.
View details
A Computer Vision Problem in Flatland
Erin Connelly
Annalisa Crannell
Timothy Duff
Rekha R. Thomas
SIAM Journal on Applied Algebra and Geometry, 10 (2026), pp. 14-45
Preview abstract
When is it possible to project two sets of labeled points of equal cardinality lying in a pair of projective planes to the same image on a projective line? We give a complete answer to this question, obtaining the following results. We first show that such a pair of projections exist if and only if the two point sets are themselves images of a common point set in projective space. Moreover, we find that for generic pairs of point sets, a common projection exists if and only if their cardinality is at most seven. In these cases, we give an explicit description of the loci of projection centers that enable a common image.
View details
MoXaRt: Audio-Visual Object-Guided Sound Interaction for XR
Sieun Kim
Qianhui Zheng
Ruoyu Xu
Ravi Tejasvi
Anuva Kulkarni
Junyi Zhu
2026
Preview abstract
In Extended Reality (XR), complex acoustic environments often overwhelm users, compromising both scene awareness and social engagement due to entangled sound sources. We introduce MoXaRt, a real-time XR system that uses audio-visual cues to separate these sources and enable fine-grained sound interaction. MoXaRt's core is a cascaded architecture that performs coarse, audio-only separation in parallel with visual detection of sources (e.g. faces, instruments). These visual anchors then guide refinement networks to isolate individual sources, separating complex mixes of up to five concurrent sources (e.g. two voices + three instruments) with ca. 2 second processing latency. We validate MoXaRt through a technical evaluation on a new, complex dataset we collected, and a 22-participant user study. Our results demonstrate that MoXaRt significantly improves communication clarity—boosting listening comprehension in noisy conditions by 33.2% (p=0.0058)—and significantly reduces cognitive load (M=7.50 vs. M=3.36, p<0.001), paving the way for more perceptive and socially adept XR experiences.
View details
VISTA: A Test-Time Self-Improving Video Generation Agent
Hootan Nakhost
Xuan Long Do
The IEEE/CVF Conference on Computer Vision and Pattern Recognition (to appear) (2026)
Preview abstract
Despite rapid advances in text-to-video (T2V) synthesis, generated video quality remains critically dependent on precise user prompts. Existing test-time optimization methods, successful in other domains, struggle with the multi-faceted nature of video. To address this, we introduce VISTA, a novel multi-agent system that autonomously refines prompts to improve video generation. VISTA operates in an iterative loop, first decomposing a user's idea into a structured temporal plan. After generation, the best video is identified through a robust pairwise tournament. This winning video is then critiqued by a trio of specialized agents focusing on visual, audio, and contextual fidelity. Finally, a reasoning agent synthesizes this feedback to introspectively rewrite and enhance the prompt for the next generation cycle. To rigorously evaluate our proposed approach, we introduce MovieGen-Bench, a new benchmark of diverse single- and multi-scene video generation tasks. Experiments show that while prior methods yield inconsistent gains, VISTA consistently improves video quality, achieving up to 60% pairwise win rate against state-of-the-art baselines. Human evaluators concur, preferring VISTA's outputs in 68% of comparisons.
View details