Publications

Our teams aspire to make discoveries that impact everyone, and core to our approach is sharing our research and tools to fuel progress in the field.

people standing in front of a screen with images and a chipboard

Our teams aspire to make discoveries that impact everyone, and core to our approach is sharing our research and tools to fuel progress in the field.

Sort By
  • Title
  • Title, descending
  • Year
  • Year, descending
1 - 15 of 859 publications
    InstructPipe: Generating Visual Blocks Pipelines with Human Instructions and LLMs
    Jing Jin
    Xiuxiu Yuan
    Jun Jiang
    Jingtao Zhou
    Yiyi Huang
    Zheng Xu
    Kristen Wright
    Jason Mayes
    Mark Sherwood
    Johnny Lee
    Alex Olwal
    Ram Iyengar
    Na Li
    Proceedings of the 2025 CHI Conference on Human Factors in Computing Systems (CHI), ACM, pp. 23
    Preview abstract Visual programming has the potential of providing novice programmers with a low-code experience to build customized processing pipelines. Existing systems typically require users to build pipelines from scratch, implying that novice users are expected to set up and link appropriate nodes from a blank workspace. In this paper, we introduce InstructPipe, an AI assistant for prototyping machine learning (ML) pipelines with text instructions. We contribute two large language model (LLM) modules and a code interpreter as part of our framework. The LLM modules generate pseudocode for a target pipeline, and the interpreter renders the pipeline in the node-graph editor for further human-AI collaboration. Both technical and user evaluation (N=16) shows that InstructPipe empowers users to streamline their ML pipeline workflow, reduce their learning curve, and leverage open-ended commands to spark innovative ideas. View details
    Beyond Touchscreens: Dynamic and Multimodal Interaction Needs
    Melissa Barnhart Wantland
    Mai Kobori
    Universal Access in Human-Computer Interaction, Springer-Verlag (2025)
    Preview abstract Today’s smartphone interactions are typically designed with one primary preset, accompanied by customization settings that can be manually adjusted. To promote the creation of contextually aware experiences, researchers have highlighted the factors that influence mobile device usage in the ability-based design framework. This paper expands upon existing frameworks and contributes to an empirical understanding of smartphone accessibility. Through a 10-day longitudinal diary study and video interview with 24 individuals who do and do not identify as having a disability, the research also illustrates the reactions of reattempt, adaptation, and avoidance, which were used in response to a lack of smartphone accessibility. Despite experiencing scenarios where accessibility settings could be leveraged, 20 out of 24 participants did not use accessibility settings on their smartphone. A total of 12 out of 24 participants tried accessibility settings on their smartphones, however identifying accessibility was not for them. This work highlights the need to shift current design practices to better serve the accessibility community. View details
    Preview abstract Eye-based interaction techniques for extended reality, such as gaze and pinch, are simple to use however suffer from input precision issues. We present H2E, a fine and coarse-grained pointing technique that cascades Hand, Head, and Eye inputs. As users initiate a pinch gesture, a cursor appears at the gaze point that can be dragged by head pointing before pinch confirmation. This has the potential advantage that it can add a precision component without changing the semantics of the technique. In this paper, we describe the design and implementation of the technique. Furthermore, we present an evaluation of our method in a Fitts-based user study, exploring the speed-accuracy trade-offs against a gaze and pinch interaction baseline. View details
    Improving simulation-based origin-destination demand calibration using sample segment counts data
    Arwa Alanqary
    Yechen Li
    The 12th Triennial Symposium on Transportation Analysis conference (TRISTAN XII), Okinawa, Japan (2025)
    Preview abstract This paper introduces a novel approach to demand estimation that utilizes partial observations of segment-level track counts. Building on established simulation-based demand estimation methods, we present a modified formulation that integrates sample track counts as a regularization term. This approach effectively addresses the underdetermination challenge in demand estimation, moving beyond the conventional reliance on a prior OD matrix. The proposed formulation aims to preserve the distribution of the observed track counts while optimizing the demand to align with observed path-level travel times. We tested this approach on Seattle's highway network with various congestion levels. Our findings reveal significant enhancements in the solution quality, particularly in accurately recovering ground truth demand patterns at both the OD and segment levels. View details
    Online-EYE: Multimodal Implicit Eye Tracking Calibration for XR
    Baosheng James Hou
    Lucy Abramyan
    Prasanthi Gurumurthy
    Khushman Patel
    Haley Adams
    Andrea Colaco
    Ken Pfeuffer
    Hans Gellersen
    Karan Ahuja
    2025
    Preview abstract Unlike other inputs for VR that work out of the box, eye tracking typically requires custom calibration per user or session. We present a multimodal inputs approach for implicit calibration of eye tracker in VR, leveraging UI interaction for continuous, background calibration. Our method analyzes gaze data alongside controller interaction with UI elements, and employing ML techniques it continuously refines the calibration matrix without interrupting users from their current tasks. Potentially eliminating the need for explicit calibration. We demonstrate the accuracy and effectiveness of this implicit approach across various tasks and real time applications achieving comparable eye tracking accuracy to native, explicit calibration. View details
    Participatory AI Considerations for Advancing Racial Health Equity
    Andrea G. Parker
    Jatin Alla
    Proceedings of the 2025 CHI Conference on Human Factors in Computing Systems (CHI) (2025) (to appear)
    Preview abstract Health-related artificial intelligence (health AI) systems are being rapidly created, largely without input from racially minoritized communities who experience persistent health inequities and stand to be negatively affected if these systems are poorly designed. Addressing this problematic trend, we critically review prior work focused on the participatory design of health AI innovations (participatory AI research), surfacing eight gaps in this work that inhibit racial health equity and provide strategies for addressing these gaps. Our strategies emphasize that “participation” in design must go beyond typical focus areas of data collection, annotation, and application co-design, to also include co-generating overarching health AI agendas and policies. Further, participatory AI methods must prioritize community-centered design that supports collaborative learning around health equity and AI, addresses root causes of inequity and AI stakeholder power dynamics, centers relationalism and emotion, supports flourishing, and facilitates longitudinal design. These strategies will help catalyze research that advances racial health equity. View details
    Preview abstract Generative AI is revolutionizing content creation and holds promise for real-time, personalized educational experiences. We investigated the effectiveness of converting textbook chapters into AI-generated podcasts and explored the impact of personalizing these podcasts for individual learner profiles. We conducted a 3x3 user study with 180 college students in the United States, comparing traditional textbook reading with both generalized and personalized AI-generated podcasts across three textbook subjects. The personalized podcasts were tailored to students’ majors, interests, and learning styles. Our findings show that students found the AI-generated podcast format to be more enjoyable than textbooks and that personalized podcasts led to significantly improved learning outcomes, although this was subject-specific. These results highlight that AI-generated podcasts can offer an engaging and effective modality transformation of textbook material, with personalization enhancing content relevance. We conclude with design recommendations for leveraging AI in education, informed by student feedback. View details
    Zoom in, Zoom out, Reframe: Domain Experts’ Strategies for Addressing Non-Experts’ Complex Questions
    Beverly Freeman
    Roma Ruparel
    Proceedings of the 2025 CHI Conference on Human Factors in Computing Systems (CHI) (2025)
    Preview abstract Consumers rely on the Internet for expert information in domains such as healthcare and law. Large Language Models (LLMs) have the potential to increase access to expert knowledge. However, past research has not addressed how to handle certain aspects of complex questions that commonly occur in expert-layperson interactions. We conducted in-depth interviews with 26 experts across multiple domains to understand how they experience and respond to challenges associated with non-experts’ questions. Results from a thematic analysis reveal three recurring strategies that experts across domains employ when fielding complex questions. Experts zoom in to clarify details of a broad information request, zoom out to address overly narrow questions or assumptions, and reframe when the underlying need is unstated or poorly represented. We discuss implications for the design of LLM-based experiences that facilitate access to expert information. View details
    Preview abstract As large language models (LLMs) improve in their capacity to serve as personal AI assistants, their ability to output uniquely tailored, personalized responses that align with the soft preferences of their users is imperative for maximizing user satisfaction and retention. However, lay users are notoriously bad at prompt specification and often struggle with conveying their latent preferences to AI assistants. To resolve this, we demonstrate that activation steering, an inference-time method, can effectively control the response of the LLMs towards expressing different preferences. In contrast to memory-based personalization methods that require long user history, steering is extremely lightweight and easily-controllable via an interpretable linear strength factor. We further conduct a within-subjects user study (n=14) to investigate how end users personalize their conversations through three different steerable chatbot interfaces. The results demonstrate the effectiveness of preference-based steering for aligning real-world conversations with user preferences, and we discuss qualitative findings on how diverse values around control, transparency, and usability of personalization lead users to prefer different interfaces. View details
    StreetReaderAI: Making Street View Accessible Using Context-Aware Multimodal AI
    Alex Fiannaca
    Nimer Jaber
    Victor Tsaran
    Proceedings of the 2025 ACM Symposium on User Interface Software and Technology (UIST'25) (to appear)
    Preview abstract Interactive streetscape mapping tools such as Google Street View (GSV) and Meta Mapillary enable users to virtually navigate and experience real-world environments via immersive 360° imagery but remain fundamentally inaccessible to blind users. We introduce StreetReaderAI, the first-ever accessible street view tool, which combines context-aware, multimodal AI, accessible navigation controls, and conversational speech. With StreetReaderAI, blind users can virtually examine destinations, engage in open-world exploration, or virtually tour any of the over 220 billion images and 100+ countries where GSV is deployed. We iteratively designed StreetReaderAI with a mixed-visual ability team and performed an evaluation with eleven blind users. Our findings demonstrate the value of an accessible street view in supporting POI investigations and remote route planning. We close by enumerating key guidelines for future work. View details
    Preview abstract The rapid emergence of generative AI models and AI powered systems has surfaced a variety of concerns around responsibility, safety, and inclusion. Some of these concerns address specific vulnerable communities, including people with disabilities. At the same time, these systems may introduce harms upon disabled users that do not fit neatly into existing accessibility classifications, and may not be addressed by current accessibility practices. In this paper, we investigate how stakeholders across a variety of job types are encountering and addressing potentially negative impacts of AI on users with disabilities. Through interviews with 25 practitioners, we identify emerging challenges related to AI’s impact on disabled users, systemic obstacles that contribute to problems, and effective strategies for impacting change. Based on these findings, we offer suggestions for improving existing processes for creating AI-powered systems and supporting practitioners in developing skills to address these emerging challenges. View details
    XR Blocks: Accelerating Human-Centered AI + XR Innovation
    Nels Numan
    Evgenii Alekseev
    Alex Cooper
    Min Xia
    Scott Chung
    Jeremy Nelson
    Xiuxiu Yuan
    Jolica Dias
    Tim Bettridge
    Benjamin Hersh
    Michelle Huynh
    Konrad Piascik
    Ricardo Cabello
    Google, XR, XR Labs (2025)
    Preview abstract We are on the cusp where Artificial Intelligence (AI) and Extended Reality (XR) are converging to unlock new paradigms of interactive computing. However, a significant gap exists between the ecosystems of these two fields: while AI research and development is accelerated by mature frameworks like PyTorch and benchmarks like LMArena, prototyping novel AI-driven XR interactions remains a high-friction process, often requiring practitioners to manually integrate disparate, low-level systems for perception, rendering, and interaction. To bridge this gap, we present XR Blocks, a cross-platform framework designed to accelerate human-centered AI + XR innovation. XR Blocks provides a modular architecture with plug-and-play components for core abstraction in AI + XR: user, world, peers; interface, context, and agents. Crucially, it is designed with the mission of "minimum code from idea to reality", accelerating rapid prototyping of complex AI + XR apps. Built upon accessible technologies (WebXR, three.js, TensorFlow, Gemini), our toolkit lowers the barrier to entry for XR creators. We demonstrate its utility through a set of open-source templates, samples, and advanced demos, empowering the community to quickly move from concept to interactive prototype. View details
    Enhancing XR Auditory Realism via Scene-Aware Multimodal Acoustic Rendering
    Jihan Li
    Penghe Zu
    Pranav Sahay
    Maruchi Kim
    Jack Obeng-Marnu
    Farley Miller
    Mahitha Rachumalla
    Rajeev Nongpiur
    Proceedings of the 38th Annual ACM Symposium on User Interface Software and Technology, Association for Computing Machinery, New York, NY, USA (2025), 1–16
    Preview abstract In Extended Reality (XR), rendering sound that accurately simulates real-world acoustics is pivotal in creating lifelike and believable virtual experiences. However, existing XR spatial audio rendering methods often struggle with real-time adaptation to diverse physical scenes, causing a sensory mismatch between visual and auditory cues that disrupts user immersion. To address this, we introduce SAMOSA, a novel on-device system that renders spatially accurate sound by dynamically adapting to its physical environment. SAMOSA leverages a synergistic multimodal scene representation by fusing real-time estimations of room geometry, surface materials, and semantic-driven acoustic context. This rich representation then enables efficient acoustic calibration via scene priors, allowing the system to synthesize a highly realistic Room Impulse Response (RIR). We validate our system through technical evaluation using acoustic metrics for RIR synthesis across various room configurations and sound types, alongside an expert evaluation (N=12). Evaluation results demonstrate SAMOSA’s feasibility and efficacy in enhancing XR auditory realism. View details
    Preview abstract A critical component in the trustworthiness of LLMs is reliable uncertainty communication, yet LLMs often use assertive language when conveying false claims, leading to over-reliance and eroded trust. We present the first systematic study of faithful confidence calibration of LLMs, benchmarking models' ability to use linguistic expressions of uncertainty that faithfully reflect their intrinsic uncertainty, across a comprehensive array of models, datasets, and prompting strategies. Our results demonstrate that LLMs largely fail at this task, and that existing interventions are insufficient: standard prompt approaches provide only marginal gains, and existing, factuality-based calibration techniques can even harm faithful calibration. To address this critical gap, we introduce MetaFaith, a novel prompt-based calibration approach inspired by human metacognition. We show that MetaFaith robustly improves faithful calibration across diverse models and task domains, enabling up to 61% improvement in faithfulness and achieving an 83% win rate over original generations as judged by humans. View details
    Preview abstract Speech recognition models, predominantly trained on standard speech, often exhibit lower accuracy for individuals with accents, dialects, or speech impairments. This disparity is particularly pronounced for economically or socially marginalized communities, including those with disabilities or diverse linguistic backgrounds. Project Euphonia, a Google initiative originally launched in English dedicated to improving Automatic Speech Recognition (ASR) of disordered speech, is expanding its data collection and evaluation efforts to include international languages like Spanish, Japanese, French and Hindi, in a continued effort to enhance inclusivity. This paper presents an overview of the extension of processes and methods used for English data collection to more languages and locales, progress on the collected data, and details about our model evaluation process, focusing on meaning preservation based on Generative AI. View details