Publications
Our teams aspire to make discoveries that impact everyone, and core to our approach is sharing our research and tools to fuel progress in the field.
Our teams aspire to make discoveries that impact everyone, and core to our approach is sharing our research and tools to fuel progress in the field.
Sort By
1 - 15 of 859 publications
InstructPipe: Generating Visual Blocks Pipelines with Human Instructions and LLMs
Jing Jin
Xiuxiu Yuan
Jun Jiang
Jingtao Zhou
Yiyi Huang
Zheng Xu
Kristen Wright
Jason Mayes
Mark Sherwood
Johnny Lee
Alex Olwal
Ram Iyengar
Na Li
Proceedings of the 2025 CHI Conference on Human Factors in Computing Systems (CHI), ACM, pp. 23
Preview abstract
Visual programming has the potential of providing novice programmers with a low-code experience to build customized processing pipelines. Existing systems typically require users to build pipelines from scratch, implying that novice users are expected to set up and link appropriate nodes from a blank workspace. In this paper, we introduce InstructPipe, an AI assistant for prototyping machine learning (ML) pipelines with text instructions. We contribute two large language model (LLM) modules and a code interpreter as part of our framework. The LLM modules generate pseudocode for a target pipeline, and the interpreter renders the pipeline in the node-graph editor for further human-AI collaboration. Both technical and user evaluation (N=16) shows that InstructPipe empowers users to streamline their ML pipeline workflow, reduce their learning curve, and leverage open-ended commands to spark innovative ideas.
View details
Beyond Touchscreens: Dynamic and Multimodal Interaction Needs
Melissa Barnhart Wantland
Mai Kobori
Universal Access in Human-Computer Interaction, Springer-Verlag (2025)
Preview abstract
Today’s smartphone interactions are typically designed with one primary preset, accompanied by customization settings that can be manually adjusted. To promote the creation of contextually aware experiences, researchers have highlighted the factors that influence mobile device usage in the ability-based design framework. This paper expands upon existing frameworks and contributes to an empirical understanding of smartphone accessibility. Through a 10-day longitudinal diary study and video interview with 24 individuals who do and do not identify as having a disability, the research also illustrates the reactions of reattempt, adaptation, and avoidance, which were used in response to a lack of smartphone accessibility. Despite experiencing scenarios where accessibility settings could be leveraged, 20 out of 24 participants did not use accessibility settings on their smartphone. A total of 12 out of 24 participants tried accessibility settings on their smartphones, however identifying accessibility was not for them. This work highlights the need to shift current design practices to better serve the accessibility community.
View details
H2E: Hand, Head, Eye: A Multimodal Cascade of Natural Inputs
Khushman Patel
Hans Gellersen
Ken Pfeuffer
IEEE VR (2025)
Preview abstract
Eye-based interaction techniques for extended reality, such as gaze and pinch, are simple to use however suffer from input precision issues. We present H2E, a fine and coarse-grained pointing technique that cascades Hand, Head, and Eye inputs. As users initiate a pinch gesture, a cursor appears at the gaze point that can be dragged by head pointing before pinch confirmation. This has the potential advantage that it can add a precision component without changing the semantics of the technique. In this paper, we describe the design and implementation of the technique. Furthermore, we present an evaluation of our method in a Fitts-based user study, exploring the speed-accuracy trade-offs against a gaze and pinch interaction baseline.
View details
Improving simulation-based origin-destination demand calibration using sample segment counts data
Arwa Alanqary
Yechen Li
The 12th Triennial Symposium on Transportation Analysis conference (TRISTAN XII), Okinawa, Japan (2025)
Preview abstract
This paper introduces a novel approach to demand estimation that utilizes partial observations of segment-level track counts. Building on established simulation-based demand estimation methods, we present a modified formulation that integrates sample track counts as a regularization term. This approach effectively addresses the underdetermination challenge in demand estimation, moving beyond the conventional reliance on a prior OD matrix. The proposed formulation aims to preserve the distribution of the observed track counts while optimizing the demand to align with observed path-level travel times. We tested this approach on Seattle's highway network with various congestion levels. Our findings reveal significant enhancements in the solution quality, particularly in accurately recovering ground truth demand patterns at both the OD and segment levels.
View details
Online-EYE: Multimodal Implicit Eye Tracking Calibration for XR
Baosheng James Hou
Lucy Abramyan
Prasanthi Gurumurthy
Khushman Patel
Haley Adams
Andrea Colaco
Ken Pfeuffer
Hans Gellersen
Karan Ahuja
2025
Preview abstract
Unlike other inputs for VR that work out of the box, eye tracking typically requires custom calibration per user or session. We present a multimodal inputs approach for implicit calibration of eye tracker in VR, leveraging UI interaction for continuous, background calibration. Our method analyzes gaze data alongside controller interaction with UI elements, and employing ML techniques it continuously refines the calibration matrix without interrupting users from their current tasks. Potentially eliminating the need for explicit calibration. We demonstrate the accuracy and effectiveness of this implicit approach across various tasks and real time applications achieving comparable eye tracking accuracy to native, explicit calibration.
View details
Participatory AI Considerations for Advancing Racial Health Equity
Andrea G. Parker
Jatin Alla
Proceedings of the 2025 CHI Conference on Human Factors in Computing Systems (CHI) (2025) (to appear)
PAIGE: Examining Student Learning Outcomes and Experiences with Personalized AI-Generated Podcasts
Tiffany Do
Usama Bin Shafqat
Elsie Ling
Νikhil Sarda
2025
Preview abstract
Generative AI is revolutionizing content creation and holds promise for real-time, personalized educational experiences. We investigated the effectiveness of converting textbook chapters into AI-generated podcasts and explored the impact of personalizing these podcasts
for individual learner profiles. We conducted a 3x3 user study with 180 college students in the United States, comparing traditional textbook reading with both generalized and personalized AI-generated podcasts across three textbook subjects. The personalized podcasts were tailored to students’ majors, interests, and learning styles. Our findings show that students found the AI-generated podcast format to be more enjoyable than textbooks and that personalized podcasts led to significantly improved learning outcomes, although this was subject-specific. These results highlight that AI-generated podcasts can offer an engaging and effective modality
transformation of textbook material, with personalization enhancing content relevance. We conclude with design recommendations for leveraging AI in education, informed by student feedback.
View details
Zoom in, Zoom out, Reframe: Domain Experts’ Strategies for Addressing Non-Experts’ Complex Questions
Beverly Freeman
Roma Ruparel
Proceedings of the 2025 CHI Conference on Human Factors in Computing Systems (CHI) (2025)
Preview abstract
Consumers rely on the Internet for expert information in domains such as healthcare and law. Large Language Models (LLMs) have the potential to increase access to expert knowledge. However, past research has not addressed how to handle certain aspects of complex questions that commonly occur in expert-layperson interactions. We conducted in-depth interviews with 26 experts across multiple domains to understand how they experience and respond to challenges associated with non-experts’ questions. Results from a thematic analysis reveal three recurring strategies that experts across domains employ when fielding complex questions. Experts zoom in to clarify details of a broad information request, zoom out to address overly narrow questions or assumptions, and reframe when the underlying need is unstated or poorly represented. We discuss implications for the design of LLM-based experiences that facilitate access to expert information.
View details
Preview abstract
As large language models (LLMs) improve in their capacity to serve as personal AI assistants, their ability to output uniquely tailored, personalized responses that align with the soft preferences of their users is imperative for maximizing user satisfaction and retention. However, lay users are notoriously bad at prompt specification and often struggle with conveying their latent preferences to AI assistants. To resolve this, we demonstrate that activation steering, an inference-time method, can effectively control the response of the LLMs towards expressing different preferences. In contrast to memory-based personalization methods that require long user history, steering is extremely lightweight and easily-controllable via an interpretable linear strength factor. We further conduct a within-subjects user study (n=14) to investigate how end users personalize their conversations through three different steerable chatbot interfaces. The results demonstrate the effectiveness of preference-based steering for aligning real-world conversations with user preferences, and we discuss qualitative findings on how diverse values around control, transparency, and usability of personalization lead users to prefer different interfaces.
View details
StreetReaderAI: Making Street View Accessible Using Context-Aware Multimodal AI
Alex Fiannaca
Nimer Jaber
Victor Tsaran
Proceedings of the 2025 ACM Symposium on User Interface Software and Technology (UIST'25) (to appear)
Preview abstract
Interactive streetscape mapping tools such as Google Street View (GSV) and Meta Mapillary enable users to virtually navigate and experience real-world environments via immersive 360° imagery but remain fundamentally inaccessible to blind users. We introduce StreetReaderAI, the first-ever accessible street view tool, which combines context-aware, multimodal AI, accessible navigation controls, and conversational speech. With StreetReaderAI, blind users can virtually examine destinations, engage in open-world exploration, or virtually tour any of the over 220 billion images and 100+ countries where GSV is deployed. We iteratively designed StreetReaderAI with a mixed-visual ability team and performed an evaluation with eleven blind users. Our findings demonstrate the value of an accessible street view in supporting POI investigations and remote route planning. We close by enumerating key guidelines for future work.
View details
"Accessibility people, you go work on that thing of yours over there": Addressing Disability Inclusion in AI Product Organizations
Sanika Moharana
Erin Buehler
Michael Madaio
Vinita Tibdewal
Proceedings of AIES 2025 (2025) (to appear)
Preview abstract
The rapid emergence of generative AI models and AI powered systems has surfaced a variety of concerns around responsibility, safety, and inclusion. Some of these concerns address specific vulnerable communities, including people with disabilities. At the same time, these systems may introduce harms upon disabled users that do not fit neatly into existing accessibility classifications, and may not be addressed by current accessibility practices. In this paper, we investigate how stakeholders across a variety of job types are encountering and addressing potentially negative impacts of AI on users with disabilities. Through interviews with 25 practitioners, we identify emerging challenges related to AI’s impact on disabled users, systemic obstacles that contribute to problems, and effective strategies for impacting change. Based on these findings, we offer suggestions for improving existing processes for creating AI-powered systems and supporting practitioners in developing skills to address these emerging challenges.
View details
XR Blocks: Accelerating Human-Centered AI + XR Innovation
Nels Numan
Evgenii Alekseev
Alex Cooper
Min Xia
Scott Chung
Jeremy Nelson
Xiuxiu Yuan
Jolica Dias
Tim Bettridge
Benjamin Hersh
Michelle Huynh
Konrad Piascik
Ricardo Cabello
Google, XR, XR Labs (2025)
Preview abstract
We are on the cusp where Artificial Intelligence (AI) and Extended Reality (XR) are converging to unlock new paradigms of interactive computing. However, a significant gap exists between the ecosystems of these two fields: while AI research and development is accelerated by mature frameworks like PyTorch and benchmarks like LMArena, prototyping novel AI-driven XR interactions remains a high-friction process, often requiring practitioners to manually integrate disparate, low-level systems for perception, rendering, and interaction. To bridge this gap, we present XR Blocks, a cross-platform framework designed to accelerate human-centered AI + XR innovation. XR Blocks provides a modular architecture with plug-and-play components for core abstraction in AI + XR: user, world, peers; interface, context, and agents. Crucially, it is designed with the mission of "minimum code from idea to reality", accelerating rapid prototyping of complex AI + XR apps. Built upon accessible technologies (WebXR, three.js, TensorFlow, Gemini), our toolkit lowers the barrier to entry for XR creators. We demonstrate its utility through a set of open-source templates, samples, and advanced demos, empowering the community to quickly move from concept to interactive prototype.
View details
Enhancing XR Auditory Realism via Scene-Aware Multimodal Acoustic Rendering
Jihan Li
Penghe Zu
Pranav Sahay
Maruchi Kim
Jack Obeng-Marnu
Farley Miller
Mahitha Rachumalla
Rajeev Nongpiur
Proceedings of the 38th Annual ACM Symposium on User Interface Software and Technology, Association for Computing Machinery, New York, NY, USA (2025), 1–16
Preview abstract
In Extended Reality (XR), rendering sound that accurately simulates real-world acoustics is pivotal in creating lifelike and believable virtual experiences. However, existing XR spatial audio rendering methods often struggle with real-time adaptation to diverse physical scenes, causing a sensory mismatch between visual and auditory cues that disrupts user immersion. To address this, we introduce SAMOSA, a novel on-device system that renders spatially accurate sound by dynamically adapting to its physical environment. SAMOSA leverages a synergistic multimodal scene representation by fusing real-time estimations of room geometry, surface materials, and semantic-driven acoustic context. This rich representation then enables efficient acoustic calibration via scene priors, allowing the system to synthesize a highly realistic Room Impulse Response (RIR). We validate our system through technical evaluation using acoustic metrics for RIR synthesis across various room configurations and sound types, alongside an expert evaluation (N=12). Evaluation results demonstrate SAMOSA’s feasibility and efficacy in enhancing XR auditory realism.
View details
Preview abstract
A critical component in the trustworthiness of LLMs is reliable uncertainty communication, yet LLMs often use assertive language when conveying false claims, leading to over-reliance and eroded trust. We present the first systematic study of faithful confidence calibration of LLMs, benchmarking models' ability to use linguistic expressions of uncertainty that faithfully reflect their intrinsic uncertainty, across a comprehensive array of models, datasets, and prompting strategies. Our results demonstrate that LLMs largely fail at this task, and that existing interventions are insufficient: standard prompt approaches provide only marginal gains, and existing, factuality-based calibration techniques can even harm faithful calibration. To address this critical gap, we introduce MetaFaith, a novel prompt-based calibration approach inspired by human metacognition. We show that MetaFaith robustly improves faithful calibration across diverse models and task domains, enabling up to 61% improvement in faithfulness and achieving an 83% win rate over original generations as judged by humans.
View details
Project Euphonia: Advancing Inclusive Speech Recognition through Expanded Data Collection and Evaluation
Alicia Martín
Bob MacDonald
Katrin Tomanek
Frontiers in Language Sciences (2025)
Preview abstract
Speech recognition models, predominantly trained on standard speech, often exhibit lower accuracy for individuals with accents, dialects, or speech impairments. This disparity is particularly pronounced for economically or socially marginalized communities, including those with disabilities or diverse linguistic backgrounds. Project Euphonia, a Google initiative originally launched in English dedicated to improving Automatic Speech Recognition (ASR) of disordered speech, is expanding its data collection and evaluation efforts to include international languages like Spanish, Japanese, French and Hindi, in a continued effort to enhance inclusivity. This paper presents an overview of the extension of processes and methods used for English data collection to more languages and locales, progress on the collected data, and details about our model evaluation process, focusing on meaning preservation based on Generative AI.
View details