Publications
Our teams aspire to make discoveries that impact everyone, and core to our approach is sharing our research and tools to fuel progress in the field.
Our teams aspire to make discoveries that impact everyone, and core to our approach is sharing our research and tools to fuel progress in the field.
Sort By
1 - 15 of 11189 publications
Preview abstract
Enterprise service centers, particularly in domains like People Operations, are critical hubs of organizational knowledge work. They face a persistent difficulty in disseminating the tacit, case-specific expertise of senior agents, which can lead to inconsistent service and slower onboarding for new hires. While existing Knowledge Management (KM) and Case-Based Reasoning (CBR) systems have improved the retrieval of historically similar cases, they inadvertently shift the cognitive burden of synthesizing this information to the time-constrained agent. This paper introduces the Dynamic Case Precedent (DCP) architecture, a novel socio-technical framework designed to address this gap. The DCP architecture moves beyond simple precedent recommendation to automated precedent synthesis. It achieves this by integrating a semantic retrieval model with the large-context reasoning capabilities of a generative Large Language Model (LLM). We propose a three-pillar framework—(1) Contextual Similarity Indexing, (2) Generative Insight Synthesis, and (3) Human-in-the-Loop Refinement. By analyzing multiple relevant historical cases to generate a concise summary of resolution patterns, the DCP architecture aims to reduce agent cognitive load, accelerate proficiency, and improve service consistency. This conceptual framework offers a new model for human-AI collaboration, framing the AI not as a mere information tool, but as an active partner in sensemaking.
View details
AgentHands: Generating Interactive Hands Gestures for Spatially Grounded Agent Conversations in XR
Ziyi Liu
Proceedings of the 2026 CHI Conference on Human Factors in Computing Systems, ACM
Preview abstract
Communicating spatial tasks via text or speech creates ``a mental mapping gap'' that limits an agent’s expressiveness. Inspired by co-speech gestures in face-to-face conversation, we propose \textsc{AgentHands}, an LLM-powered XR system that equips agents with hands to render responses clearer and more engaging. Guided by a design taxonomy distilled from a formative study (N=10), we implement a novel pipeline to generate and render a hand agent that augments conversational responses with synchronized, space-aware, and interactive hand gestures: using a meta-instruction, \textsc{AgentHands} generates verbal responses embedded with \textit{GestureEvents} aligned to specific words; each event specifies gesture type and parameters. At runtime, a parser converts events into time-stamped poses and motions, driving an animation system that renders expressive hands synchronized with speech. In a within-subjects study (N=12), \textsc{AgentHands} increased engagement and made spatially grounded conversations easier to follow compared to a speech-only baseline.
View details
Who Controls the Curriculum for AI? The Limits of Participatory Design for Educational AI
Michael Madaio
Learning Under Algorithmic Conditions, University of Minnesota Press (2026)
Preview abstract
Participatory design is a long-standing effort to shift control over technology design from technologists to users and communities impacted by technologies. For educational AI, this means involving students, families, teachers, and other stakeholders in shaping the design of AI systems. While promising, in this article, I situate the recent calls for participatory design of educational AI systems within a different historical tradition—that of contests over local control of educational curricula. I argue that approaches that attempt to steer the design and development of educational AI through participatory methods may inadvertently reproduce the history of political contestation of educational curricula, in ways that may privilege the most powerful communities, rather than those inequitably impacted. What might it look like to treat participatory AI design as a site for political contestation? How might these approaches avoid reproducing the same majoritarian tendencies that led to educational inequities in the first place?
View details
Preview abstract
For many practical applications of quantum computing, the slowest and most costly steps involve coherently accessing classical data. We help address this challenge by applying mass production techniques, which can sometimes allow us to perform operations many times in parallel for a cost that is comparable to a single execution[1-3]. We combine existing mass-production results with modern approaches for loading classical data using ``quantum read-only memory.'' We show that quantum mass production techniques offer no benefit when we consider a cost model that focuses purely on the number of non-Clifford gates. However, analyzing the constant factors in a more nuanced cost model, we find that it may be possible to obtain a reduction in cost of an order or magnitude or more for a variety reasonably-sized fault-tolerant quantum algorithms. We present several applications of quantum mass-production techniques beyond naive parallelization, including a strategy for reducing the cost of serial calls to the same data loading step.
View details
Gaze Target Estimation Anywhere with Concepts
Xu Cao
Houze Yang
Vipin Gunda
Inki Kim
Jim Rehg
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2026)
Preview abstract
Estimating human gaze targets in-the-wild is a formidable challenge. Existing computer vision algorithms rely on brittle, multi-stage pipelines that require explicit inputs like head bounding boxes and human pose, causing initial detection errors to cascade and lead to system failure. To overcome this, we introduce the \textbf{Promptable Gaze Target Estimation (PGE)} task, a new end-to-end, concept-driven paradigm. PGE conditions gaze prediction on flexible user text or visual prompts (e.g., "the boy in the red shirt" or "person in point [0.52, 0.48]") to identify a specific subject's target, which eliminates the rigid dependency on intermediate localization cues. We develop a scalable data engine to generate \textbf{Gaze-Co}, a dataset and benchmark of 120K high-quality, prompt-annotated image pairs. We also propose \textbf{AnyGaze}, the first model designed for PGE. AnyGaze uses a Transformer-based detector to fuse features from frozen encoders and simultaneously solves subject localization, in/out-of-frame presence, and gaze target heatmap estimation. AnyGaze achieves state-of-the-art performance on standard gaze target estimation benchmarks, setting a strong baseline for this new problem even on a difficult out-of-domain, real-world clinical dataset. We will open-source the AnyGaze model and the Gaze-Co benchmark.
View details
Preview abstract
In some multi-stage software build pipelines, downstream compiler errors may be reported against ephemeral, machine-generated intermediate artifacts rather than original, human-written source code, which can make remediation challenging. A system and method may address this by intercepting a downstream error, mapping its location back to the original source file, and programmatically injecting a dormant suppression tag into the original source code. During a subsequent build, an intermediate transpiler can propagate this tag into a newly generated intermediate artifact. In the intermediate file, the tag may become active and be recognized by the downstream compiler as a directive to suppress the specific error. This approach can facilitate an automated remediation process for certain build failures that avoids direct modification of ephemeral files and uses the original source code as a record for suppression.
View details
VISTA: A Test-Time Self-Improving Video Generation Agent
Hootan Nakhost
Xuan Long Do
The IEEE/CVF Conference on Computer Vision and Pattern Recognition (to appear) (2026)
Preview abstract
Despite rapid advances in text-to-video (T2V) synthesis, generated video quality remains critically dependent on precise user prompts. Existing test-time optimization methods, successful in other domains, struggle with the multi-faceted nature of video. To address this, we introduce VISTA, a novel multi-agent system that autonomously refines prompts to improve video generation. VISTA operates in an iterative loop, first decomposing a user's idea into a structured temporal plan. After generation, the best video is identified through a robust pairwise tournament. This winning video is then critiqued by a trio of specialized agents focusing on visual, audio, and contextual fidelity. Finally, a reasoning agent synthesizes this feedback to introspectively rewrite and enhance the prompt for the next generation cycle. To rigorously evaluate our proposed approach, we introduce MovieGen-Bench, a new benchmark of diverse single- and multi-scene video generation tasks. Experiments show that while prior methods yield inconsistent gains, VISTA consistently improves video quality, achieving up to 60% pairwise win rate against state-of-the-art baselines. Human evaluators concur, preferring VISTA's outputs in 68% of comparisons.
View details
Neural general circulation models for modeling precipitation
Stephan Hoyer
Dmitrii Kochkov
Janni Yuval
Ian Langmore
Science Advances (2026)
Preview abstract
Climate models struggle to accurately simulate precipitation, particularly extremes and the diurnal cycle. While hybrid models combining machine learning and physics have emerged with the premise of improving precipitation simulations, none have proven sufficiently skillful or stable enough to outperform existing models in simulating precipitation.
Here, we present the first hybrid model that is trained directly on precipitation observations. The model runs at 2.8 degrees resolution and is built on the differentiable NeuralGCM framework. This model is stable for decadal simulations and demonstrates significant improvements over existing GCMs, ERA5 reanalysis, and a Global Cloud-Resolving Model in simulating precipitation. Our approach yields reduced biases, a more realistic precipitation distribution, improved representation of extremes, and a more accurate diurnal cycle.
Furthermore, it outperforms the ECMWF ensemble for mid-range weather forecasting.
This advance paves the way for more reliable simulations of current climate and for the ability to fully utilize the abundance of existing observations to further improve GCMs.
View details
Preview abstract
Generative AI (GenAI) is evolving from standalone tools to interconnected ecosystems that integrate chatbots, cloud platforms, and
third-party services. While this ecosystem model enables personalization and extended services, it also introduces complex information flows and amplifies privacy risks. Existing solutions focus on
system-level protections, offering little support for users to make
meaningful privacy choices. To address this gap, we conducted two
vignette-based survey studies with 486 participants and a followup interview study with 16 participants. We also explored users’
needs and preferences for privacy choice design across both GenAI
personalization and data-sharing. Our results reveal paradoxical
patterns: participants sometimes trusted third-party ecosystems
more for personalization but perceived greater control in first-party
ecosystems when data was shared externally. We discuss design implications for privacy choice interfaces that enhance transparency,
control, and trust in GenAI ecosystems.
View details
Phoenix: Rowhammer Attacks on DDR5 with Self-Correcting Synchronization
Preview
Michele Marazzi
Kaveh Razavi
Salman Qazi
Diego Meyer
Patrick Jattke
IEEE Security & Privacy (S&P) (2026)
Preview abstract
Modern user interfaces are complex composites, with elements originating from various sources, such as the operating system, apps, a web browser, or websites. Many security and privacy models implicitly depend on users correctly identifying an element's source, a concept we term ''surface attribution.'' Through two large-scale vignette-based surveys (N=4,400 and N=3,057), we present the first empirical measurement of this ability.
We find that users struggle, correctly attributing UI source only 55% of the time on desktop and 53% on mobile. Familiarity and strong brand cues significantly improve accuracy, whereas UI positioning, a long-held security design concept especially for browsers, has minimal impact. Furthermore, simply adding a ''Security & Privacy'' brand cue to Android permission prompts failed to improve attribution. These findings demonstrate a fundamental gap in users' mental models, indicating that relying on them to distinguish trusted UI is a fragile security paradigm.
View details
ARM MTE Performance in Practice
Preview
Taehyun Noh
Yingchen Wang
Tal Garfinkel
Mahesh Madhav
Mattan Erez
Shravan Narayan
Usenix Security (2026)
Usability Hasn’t Peaked: Exploring How Expressive Design Overcomes the Usability Plateau
Alyssa Sheehan
Bianca Gallardo
Ying Wang
Proceedings of the 2026 CHI Conference on Human Factors in Computing Systems (CHI ’26), April 13–17, 2026, Barcelona, Spain (2026)
Preview abstract
Critics have argued that mobile usability has largely been optimized, and that only incremental gains are possible. We set out to explore if the newest generation of design systems, which promote greater flexibility and a return to design basics, could produce substantially more usable designs while maintaining or increasing aesthetic judgments. Through a study with 48 diverse participants completing tasks in 10 different applications, we found that in designs created following Material 3 Expressive guidelines, users fixated on the correct screen element for a task 33% faster, completed tasks 20% faster, and rated experiences more positively compared to versions designed using the previous Material design system. These improvements in performance and aesthetic ratings challenge the premise of a usability plateau and show that mobile usability has not peaked. We illustrate specific opportunities to make mobile experiences more usable by returning to design fundamentals while highlighting risks of added flexibility.
View details
Peeking Ahead of the Field Study: Exploring VLM Personas as Support Tools for Embodied Studies in HCI
Xinyue Gui
Ding Xia
Mark Colley
Yuan Li
Vishal Chauhan
Anubhav Anubhav
Ehsan Javanmardi
Stela Hanbyeol Seo
Chia-Ming Chang
Manabu Tsukada
Takeo Igarashi
Proceedings of the 2026 CHI Conference on Human Factors in Computing Systems (CHI 26)
Preview abstract
Field studies are irreplaceable but costly, time-consuming, and error-prone, which need careful preparation. Inspired by rapid-prototyping in manufacturing, we propose a fast, low-cost evaluation method using Vision-Language Model (VLM) personas to simulate outcomes comparable to field results. While LLMs show human-like reasoning and language capabilities, autonomous vehicle (AV)-pedestrian interaction requires spatial awareness, emotional empathy, and behavioral generation. This raises our research question: To what extent can VLM personas mimic human responses in field studies? We conducted parallel studies: 1) one real-world study with 20 participants, and 2) one video-study using 20 VLM personas, both on a street-crossing task. We compared their responses and interviewed five HCI researchers on potential applications. Results show that VLM personas mimic human response patterns (e.g., average crossing times of 5.25 s vs. 5.07 s) lack the behavioral variability and depth. They show promise for formative studies, field study preparation, and human data augmentation.
View details
From Correctness to Collaboration: A Human-Centered Taxonomy of AI Agent Behavior in Software Engineering
Sherry Shi
Extended Abstracts of the 2026 CHI Conference on Human Factors in Computing Systems (CHI EA ’26), ACM, New York, NY, USA (2026)
Preview abstract
The ongoing transition of Large Language Models in software engineering from code generators into autonomous agents requires a shift in how we define and measure success. While models are becoming more capable, the industry lacks a clear understanding of the behavioral norms that make an agent effective in collaborative software development in the enterprise. This work addresses this gap by presenting a taxonomy of desirable agent behaviors, synthesized from 91 sets of developer-defined rules for coding agents. We identify four core expectations: Adhere to Standards and Processes, Ensure Code Quality and Reliability, Solve Problems Effectively, and Collaborate with the Developer. These findings offer a concrete vocabulary for agent behavior, enabling researchers to move beyond correctness-only benchmarks and start designing evaluations that reflect the socio-technical nature of professional software development in enterprises.
View details