Publications

Our teams aspire to make discoveries that impact everyone, and core to our approach is sharing our research and tools to fuel progress in the field.

people standing in front of a screen with images and a chipboard

Our teams aspire to make discoveries that impact everyone, and core to our approach is sharing our research and tools to fuel progress in the field.

Sort By
  • Title
  • Title, descending
  • Year
  • Year, descending
1 - 15 of 11329 publications
Preview abstract The rapid expansion of the Internet of Things (IoT) and smart home ecosystems has led to a fragmented landscape of user data management across consumer electronics (CE) such as Smart TVs, gaming consoles, and set-top boxes. Current onboarding processes on these devices are characterized by high friction due to manual data entry and opaque data-sharing practices. This paper introduces the User Data Sharing System (UDSS), a platform-agnostic framework designed to facilitate secure, privacy-first PII (Personally Identifiable Information) exchange between device platforms and third-party applications. Our system implements a Contextual Scope Enforcement (CSE) mechanism that programmatically restricts data exposure based on user intent—specifically distinguishing between Sign-In and Sign-Up workflows. Unlike cloud-anchored identity standards such as FIDO2/WebAuthn, UDSS is designed for shared, device-centric CE environments where persistent user-to-device bind-ing cannot be assumed. We further propose a tiered access model that balances developer needs with regulatory compliance (GDPR/CCPA). A proof-of-concept implementation on a reference ARMv8 Linux-based middleware demonstrates that UDSS reduces user onboarding latency by 65% and measurably reduces PII over-exposure risk through protocol-enforced data minimization. This framework provides a standardized approach to identity management in the heterogeneous CE market. View details
Preview abstract Voice activity detection (VAD) plays a vital role in enabling applications such as speech recognition. We analyze the impact of window size on the accuracy of three VAD algorithms: Silero, WebRTC, and Root Mean Square (RMS) across a set of diverse real-world digital audio streams. We additionally explore the use of hysteresis on top of each VAD output. Our results offer practical references for optimizing VAD systems. Silero significantly outperforms WebRTC and RMS, and hysteresis provides a benefit for WebRTC. View details
Diffusion Controller: Framework, Algorithms and Parameterization
Tong Yang
Moonkyung Ryu
Guy Tennenholtz
Yuejie Chi
Proceedings of the 43rd International Conference on Machine Learning (ICML-26), Seoul, South Korea (2026)
Preview abstract Controllable generation with diffusion models is often treated as a collection of heuristics rather than a unified optimization problem. We propose a principled control formulation by viewing the diffusion reverse process as an instance of a (generalized) linearly-solvable Markov decision process (LS-MDP). This perspective turns controllable generation into regularized optimal control around a pretrained diffusion policy, yielding tractable objectives and algorithmic updates. Under this framework, we study two practical finetuning regimes. When paired target data are available, we obtain a supervised finetuning (SFT) objective. When only a terminal reward model is available, we derive reinforcement-learning finetuning (RLFT) methods from the LS-MDP solution structure, including (i) a reward-weighted regression loss and (ii) a policy-gradient approach (with standard extensions such as PPO). Crucially, the LS-MDP optimality conditions imply an explicit relationship between the optimal and pretrained score functions. We leverage this to derive a new score-function parameterization that isolates the control signal and enables “gray-box” finetuning with substantially fewer trainable parameters. Experiments across SFT and RLFT show this parameterization improves over existing finetuning baselines while achieving stronger sample/parameter efficiency. View details
Efficient, Property-Aligned Fan-Out Retrieval via RL-Compiled Diffusion
Patrick Jiang
Judith Li
Moonkyung Ryu
Lily Hu
Kun Su
Liam Hebert
Hao Peng
Jiawei Han
Dima Kuzmin
Efficient, Property-Aligned Fan-Out Retrieval via RL-Compiled Diffusion, Seoul, South Korea (2026)
Preview abstract Many modern retrieval problems are set-valued: given a broad intent, the system must return a collection of results that optimizes higher-order properties (e.g., diversity, coverage, complementarity, coherence) while staying grounded to a fixed database. These objectives are inherently non-decomposable, creating a training bottleneck because property-aligned (query, content) supervision is scarce. Reinforcement learning (RL) can optimize set-level objectives via interaction, but deploying an RL-tuned LLM for fan-out retrieval is expensive at query time. Diffusion-based generative retrieval enables efficient single-pass fan-out in embedding space, but requires objective-aligned training targets. We propose R4T (Retrieve-for-Train), which uses RL once as an objective transducer: (i) train a fan-out LLM with composite set-level rewards, (ii) synthesize objective-consistent training pairs, and (iii) train a lightweight diffusion retriever to model the conditional distribution of set-valued outputs. Across Polyvore and a large-scale music playlist dataset, R4T improves retrieval quality over strong baselines while reducing query-time fan-out latency by an order of magnitude. View details
Preview abstract In some multi-stage software build pipelines, downstream compiler errors may be reported against ephemeral, machine-generated intermediate artifacts rather than original, human-written source code, which can make remediation challenging. A system and method may address this by intercepting a downstream error, mapping its location back to the original source file, and programmatically injecting a dormant suppression tag into the original source code. During a subsequent build, an intermediate transpiler can propagate this tag into a newly generated intermediate artifact. In the intermediate file, the tag may become active and be recognized by the downstream compiler as a directive to suppress the specific error. This approach can facilitate an automated remediation process for certain build failures that avoids direct modification of ephemeral files and uses the original source code as a record for suppression. View details
Preview abstract We prove the following asymptotically tight lower bound for k-color discrepancy: For any k ≥ 2, there exists a hypergraph with n vertices such that its k-color discrepancy is at least Ω(√n). This improves on the previously known lower bound of Ω(√n/ log k) due to Caragiannis et al. [CLS25]. As an application, we show that our result implies improved lower bounds for group fair division. View details
Preview abstract This article delves into how Google Site Reliability Engineers (SREs) leverage Gemini 3 and the Gemini CLI to aggressively reduce Mean Time to Mitigation (MTTM) during real-world outages. By focusing on the SRE motto of "Eliminate Toil," the article walks through a simulated incident, demonstrating how an agentic CLI acts as a human-in-the-loop copilot across the entire incident lifecycle: from initial paging and investigation, through safe, tool-driven mitigation and root cause analysis, to automated postmortem generation and action item filing. This direct integration of Gemini's reasoning capabilities with operational data and internal tools creates a virtuous cycle where past incident learnings continuously inform and improve future solutions. View details
Preview abstract Despite advances in high performance computing, accurate numerical simulations of global atmospheric dynamics remain a challenge. The resolution required to fully resolve the vast range scales as well as the strong coupling with—often not fully-understood—physics renders such simulations computationally infeasible over time horizons relevant for long-term climate risk assessment. While data-driven parameterizations have shown some promise of alleviating these obstacles, the scarcity of high-quality training data and their lack of long-term stability typically hinders their ability to capture the risk of rare extreme events. In this work we present a general strategy for training variational (probabilistic) neural network models to non-intrusively correct under-resolved long-time simulations of turbulent climate systems. The approach is based on the paradigm introduced by Barthel Sorensen et al. (2024, https://doi.org/10.1029/2023ms004122) which involves training a post-processing correction operator on under-resolved simulations nudged toward a high-fidelity reference. Our variational framework enables us to learn the dynamics of the underlying system from very little training data and thus drastically improve the extrapolation capabilities of the previous deterministic state-of-the art—even when the statistics of that training data are far from converged. We investigate and compare three recently introduced variational network architectures and illustrate the benefits of our approach on an anisotropic quasi-geostrophic flow. For this prototype model our approach is able to not only accurately capture global statistics, but also the anistropic regional variation and the statistics of multiple extreme event metrics—demonstrating significant improvement over previously introduced deterministic architectures. View details
MoXaRt: Audio-Visual Object-Guided Sound Interaction for XR
Sieun Kim
Qianhui Zheng
Ruoyu Xu
Ravi Tejasvi
Anuva Kulkarni
Junyi Zhu
2026
Preview abstract In Extended Reality (XR), complex acoustic environments often overwhelm users, compromising both scene awareness and social engagement due to entangled sound sources. We introduce MoXaRt, a real-time XR system that uses audio-visual cues to separate these sources and enable fine-grained sound interaction. MoXaRt's core is a cascaded architecture that performs coarse, audio-only separation in parallel with visual detection of sources (e.g. faces, instruments). These visual anchors then guide refinement networks to isolate individual sources, separating complex mixes of up to five concurrent sources (e.g. two voices + three instruments) with ca. 2 second processing latency. We validate MoXaRt through a technical evaluation on a new, complex dataset we collected, and a 22-participant user study. Our results demonstrate that MoXaRt significantly improves communication clarity—boosting listening comprehension in noisy conditions by 33.2% (p=0.0058)—and significantly reduces cognitive load (M=7.50 vs. M=3.36, p<0.001), paving the way for more perceptive and socially adept XR experiences. View details
Preview abstract Generative AI is reshaping software development, yet its psychological impact remains under-researched. During May and August 2025 we conducted reflexive thematic analysis of interviews with 12 senior engineers (≥5 years experience) recruited from Western technology hubs to explore shifts in professional identity. We identify a central transition from "coder to conductor," where AI acts as a cognitive partner. Key findings include: (1) a re-architecting of focus from implementation to strategy; (2) a shift in productivity metrics from output to impact; and (3) a dual-impact on agency, where AI empowers autonomy but threatens competence through de-skilling anxieties. These findings suggest that as implementation becomes commoditised, organisational training and career progression must prioritise architectural mastery and metacognitive oversight to ensure sustained developer motivation and system integrity. View details
Agentic Coding Needs Proactivity, Not Just Autonomy
Georgios Evangelopoulos
(2026) (to appear)
Preview abstract Coding agents are rapidly changing the landscape of software development, moving from inline com- pletion to autonomous systems that edit repositories, open pull requests, respond to issues, and run scheduled or webhook triggered routines across the development life cycle. The next generation is increasingly described as proactive and long-horizon: agents should notice relevant changes before the developer asks, connect signals across tools, decide when to interrupt, and carry preferences across sessions. Yet the field lacks a precise account of what proactivity means for software development, how it differs from autonomy, what acceptance criteria proactive long-horizon tasks should satisfy, and which metrics determine whether unsolicited agent behavior is useful rather than merely active. We argue that proactive coding agents should be evaluated by the quality and improvement of their insight policy: the policy that decides what matters next, what evidence supports it, whether to surface it, and how to adapt after feedback. We re-anchor this view in mixed initiative interaction, introduce a three level taxonomy (Reactive, Scheduled, and Situation Aware), compare contemporary coding agents against five operational criteria, and sketch an active user simulation protocol with three evaluation targets: Insight Decision Quality (IDQ), Context Grounding Score (CGS), and Learning Lift (LL). View details
SpatialStack: Layered Geometry-Language Fusion for 3D VLM Spatial Reasoning
Jian Zhang
Bangya Liu
Achuta Kadambi
Zhiwen Fan
IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2026)
Preview abstract Large vision-language models (VLMs) still struggle with reliable 3D spatial reasoning, a core capability for embodied and physical AI systems. This limitation arises from their inability to capture fine-grained 3D geometry and spatial relationships. While recent efforts have introduced multi-view geometry transformers into VLMs, they typically fuse only the deep-layer features from vision and geometry encoders, discarding rich hierarchical signals and creating a fundamental bottleneck for spatial understanding. To overcome this, we propose SpatialStack, a general hierarchical fusion framework that progressively aligns vision, geometry, and language representations across the model hierarchy. Moving beyond conventional late-stage vision-geometry fusion, SpatialStack stacks and synchronizes multi-level geometric features with the language backbone, enabling the model to capture both local geometric precision and global contextual semantics. Building upon this framework, we develop VLM-SpatialStack, a model that achieves state-of-the-art performance on multiple 3D spatial reasoning benchmarks. Extensive experiments and ablations demonstrate that our multi-level fusion strategy consistently enhances 3D understanding and generalizes robustly across diverse spatial reasoning tasks, establishing SpatialStack as an effective and extensible design paradigm for vision-language-geometry integration in next-generation multimodal physical AI systems. View details
Preview abstract This talk addresses the challenges of operating Google's monitoring systems at scale, handling terabytes of telemetry data and preventing overload from diverse workloads. We'll explore how Google's internal client library and Monarch, its planet-scale time-series database, work together for cost-effective data collection. Key principles include a distributed push model, dynamic client-side data reduction, centralized retention, and periodic metric analysis. The session will then bridge these concepts to the open-source world, discussing our work with OpenTelemetry's OpAMP protocol to achieve similar scalable and efficient telemetry collection. Attendees will gain insights into adapting these principles for cost savings and learn about our collaboration with the OpAMP SIG to benefit the broader community. View details
SAC133 - SSAC Comments on Proposed Root KSK Algorithm Rollover
Wes Hardaker
Internet Corporation for Assigned Names and Numbers (ICANN), ICANN Security and Stability Advisory Committee (SSAC) Reports and Advisories (2026), pp. 9
Preview abstract The SSAC supports the transition from RSA with SHA-256 (Algorithm 8) to ECDSA P-256 with SHA-256 (Algorithm 13) as the cryptographic algorithm for the RootKSK. The root zone has relied on RSA-based algorithms since DNSSEC signing began in 2010. The algorithm did not change during the first KSK rollover in 2018 or during the second rollover currently underway and scheduled to complete in October 2026. Establishing a clear and predictable process for algorithm transitions is essential to the long-term security of the root zone, and the SSAC observes that the proposal addresses the Recommendation 23 of the SSR2 Review accordingly. The SSAC notes that the proposal builds upon the Root Zone DNSSEC Algorithm Rollover Study published by ICANN in May 2024, which assessed resolver and authoritative server support for alternative algorithms, analyzed rollover methodologies, and evaluated operational risks. The SSAC finds that the proposal implements the study’s recommendations. The SSAC also notes that this proposal is consistent with the SSAC’s prior work on DNSSEC key rollover, including SAC063, SAC073, SAC102, and SAC108. The SSAC encourages ICANN to proceed with this rollover. Specific comments on the proposal’s methodology, timeline, and operational readiness follow View details
Preview abstract Managing compiler build errors that can arise during infrastructure upgrades in large, polyglot codebases may be challenging, as manual remediation can be slow and some automated tools may not support modern language syntax. A system can provide automated error remediation by ingesting compiler diagnostics and analyzing source code using an Abstract Syntax Tree (AST). A recursive scope resolution algorithm, for example, can traverse the AST to identify a specific and narrowly-scoped code block at which to apply an error suppression. Conversely, this algorithmic complexity can be bypassed when lexical scope resolution is not required, and the system can identify the specific location of error suppressions directly from the error's exact coordinates. The system may then generate and apply language-specific patches, such as structured comments for JavaScript source files or line-scoped comments for TypeScript source files, for example, by using a transactional rewrite engine. This approach can provide a scalable method for managing automated code remediation, which may facilitate infrastructure upgrades by reducing the need for manual intervention. View details
×