Alex Olwal
I am a Tech Lead/Manager in Google’s Augmented Reality team and a founder of the Interaction Lab. I direct research and development of interaction technologies based on advancements in display technology, low-power and high-speed sensing, wearables, actuation, electronic textiles, and human—computer interaction. I am passionate about accelerating innovation and disruption through tools, techniques and devices that enable augmentation and empowerment of human abilities. Research interests include augmented reality, ubiquitous computing, mobile devices, 3D user interfaces, interaction techniques, interfaces for accessibility and health, medical imaging, and software/hardware prototyping.
Google I/O 2022 Keynote: Augmented Language
Our Augmented Language project was featured in the I/O 2022 Keynote.
"Let's see what happens when we take our advances in translation and transcription, and deliver them in your line-of-sight", Sundar Pichai, CEO.
· 2020-Now Augmented Reality
· 2018-2020 Google AI: Research & Machine Intelligence
· 2017-2018 ATAP (Advanced Technology and Projects)
· 2016-2017 Wearables, Augmented and Virtual Reality
· 2015-2016 Project Aura, Glass and Beyond
· 2014-2015 Google X
My work is building on my experience from research labs and institutions, including MIT Media Lab, Columbia University, University of California - Santa Barbara, KTH (Royal Institute of Technology), and Microsoft Research. I have taught at Stanford University, Rhode Island School of Design and KTH.
Portfolio: olwal.com
Google I/O 2022 Keynote: Augmented Language
Our Augmented Language project was featured in the I/O 2022 Keynote.
"Let's see what happens when we take our advances in translation and transcription, and deliver them in your line-of-sight", Sundar Pichai, CEO.
· 2020-Now Augmented Reality
· 2018-2020 Google AI: Research & Machine Intelligence
· 2017-2018 ATAP (Advanced Technology and Projects)
· 2016-2017 Wearables, Augmented and Virtual Reality
· 2015-2016 Project Aura, Glass and Beyond
· 2014-2015 Google X
My work is building on my experience from research labs and institutions, including MIT Media Lab, Columbia University, University of California - Santa Barbara, KTH (Royal Institute of Technology), and Microsoft Research. I have taught at Stanford University, Rhode Island School of Design and KTH.
Portfolio: olwal.com
Authored Publications
Sort By
UI Mobility Control in XR: Switching UI Positionings between Static, Dynamic, and Self Entities
Siyou Pei
Yang Zhang
Proceedings of the 2024 CHI Conference on Human Factors in Computing Systems, ACM, pp. 12 (to appear)
Preview abstract
Extended reality (XR) has the potential for seamless user interface (UI) transitions across people, objects, and environments. However, the design space, applications, and common practices of 3D UI transitions remain underexplored. To address this gap, we conducted a need-finding study with 11 participants, identifying and distilling a taxonomy based on three types of UI placements --- affixed to static, dynamic, or self entities. We further surveyed 113 commercial applications to understand the common practices of 3D UI mobility control, where only 6.2% of these applications allowed users to transition UI between entities. In response, we built interaction prototypes to facilitate UI transitions between entities. We report on results from a qualitative user study (N=14) on 3D UI mobility control using our FingerSwitches technique, which suggests that perceived usefulness is affected by types of entities and environments. We aspire to tackle a vital need in UI mobility within XR.
View details
ChatDirector: Enhancing Video Conferencing with Space-Aware Scene Rendering and Speech-Driven Layout Transition
Brian Moreno Collins
Karthik Ramani
Proceedings of the 2024 CHI Conference on Human Factors in Computing Systems, ACM, pp. 16 (to appear)
Preview abstract
Remote video conferencing systems (RVCS) are widely adopted in personal and professional communication. However, they often lack the co-presence experience of in-person meetings. This is largely due to the absence of intuitive visual cues and clear spatial relationships among remote participants, which can lead to speech interruptions and loss of attention. This paper presents ChatDirector, a novel RVCS that overcomes these limitations by incorporating space-aware visual presence and speech-aware attention transition assistance. ChatDirector employs a real-time pipeline that converts participants' RGB video streams into 3D portrait avatars and renders them in a virtual 3D scene. We also contribute a decision tree algorithm that directs the avatar layouts and behaviors based on participants' speech states. We report on results from a user study (N=16) where we evaluated ChatDirector. The satisfactory algorithm performance and complimentary subject user feedback imply that ChatDirector significantly enhances communication efficacy and user engagement.
View details
Experiencing InstructPipe: Building Multi-modal AI Pipelines via Prompting LLMs and Visual Programming
Zhongyi Zhou
Jing Jin
Xiuxiu Yuan
Jun Jiang
Jingtao Zhou
Yiyi Huang
Kristen Wright
Jason Mayes
Mark Sherwood
Ram Iyengar
Na Li
Extended Abstracts of the 2024 CHI Conference on Human Factors in Computing Systems, ACM, pp. 5
Preview abstract
Foundational multi-modal models have democratized AI access, yet the construction of complex, customizable machine learning pipelines by novice users remains a grand challenge. This paper demonstrates a visual programming system that allows novices to rapidly prototype multimodal AI pipelines. We first conducted a formative study with 58 contributors and collected 236 proposals of multimodal AI pipelines that served various practical needs. We then distilled our findings into a design matrix of primitive nodes for prototyping multimodal AI visual programming pipelines, and implemented a system with 65 nodes. To support users' rapid prototyping experience, we built InstructPipe, an AI assistant based on large language models (LLMs) that allows users to generate a pipeline by writing text-based instructions. We believe InstructPipe enhances novice users onboarding experience of visual programming and the controllability of LLMs by offering non-experts a platform to easily update the generation.
View details
Experiencing Thing2Reality: Transforming 2D Content into Conditioned Multiviews and 3D Gaussian Objects for XR Communication
Erzhen Hu
Mingyi Li
Seongkook Heo
Adjunct Proceedings of the 33rd Annual ACM Symposium on User Interface Software and Technology, ACM (2024)
Preview abstract
During remote communication, participants share both digital and physical content, such as product designs, digital assets, and environments, to enhance mutual understanding. Recent advances in augmented communication have facilitated users to swiftly create and share digital 2D copies of physical objects from video feeds into a shared space. However, the conventional 2D representation of digital objects restricts users’ ability to spatially reference items in a shared immersive environment. To address these challenges, we propose Thing2Reality, an Extended Reality (XR) communication platform designed to enhance spontaneous discussions regard-ing both digital and physical items during remote sessions. WithThing2Reality, users can quickly materialize ideas or physical objects in immersive environments and share them as conditioned multiview renderings or 3D Gaussians. Our system enables users to interact with remote objects or discuss concepts in a collaborative manner.
View details
Experiencing Rapid Prototyping of Machine Learning Based Multimedia Applications in Rapsai
Na Li
Jing Jin
Michelle Carney
Xiuxiu Yuan
Ping Yu
Ram Iyengar
CHI EA '23: Extended Abstracts of the 2023 CHI Conference on Human Factors in Computing Systems, ACM, 448:1-4
Preview abstract
We demonstrate Rapsai, a visual programming platform that aims to streamline the rapid and iterative development of end-to-end machine learning (ML)-based multimedia applications. Rapsai features a node-graph editor that enables interactive characterization and visualization of ML model performance, which facilitates the understanding of how the model behaves in different scenarios. Moreover, the platform streamlines end-to-end prototyping by providing interactive data augmentation and model comparison capabilities within a no-coding environment. Our demonstration showcases the versatility of Rapsai through several use cases, including virtual background, visual effects with depth estimation, and audio denoising. The implementation of Rapsai is intended to support ML practitioners in streamlining their workflow, making data-driven decisions, and comprehensively evaluating model behavior with real-world input.
View details
Modeling and Improving Text Stability in Live Captions
Xingyu "Bruce" Liu
Jun Zhang
Leonardo Ferrer
Susan Xu
Vikas Bahirwani
Boris Smus
Extended Abstract of the 2023 CHI Conference on Human Factors in Computing Systems (CHI), ACM, 208:1-9
InstructPipe: Building Visual Programming Pipelines with Human Instructions
Zhongyi Zhou
Jing Jin
Xiuxiu Yuan
Jun Jiang
Jingtao Zhou
Yiyi Huang
Kristen Wright
Jason Mayes
Mark Sherwood
Ram Iyengar
Na Li
arXiv, 2312.09672 (2023)
Preview abstract
Visual programming provides beginner-level programmers with a coding-free experience to build their customized pipelines. Existing systems require users to build a pipeline entirely from scratch, implying that novice users need to set up and link appropriate nodes all by themselves, starting from a blank workspace. We present InstructPipe, an AI assistant that enables users to start prototyping machine learning (ML) pipelines with text instructions. We designed two LLM modules and a code interpreter to execute our solution. LLM modules generate pseudocode of a target pipeline, and the interpreter renders a pipeline in the node-graph editor for further human-AI collaboration. Technical evaluations reveal that InstructPipe reduces user interactions by 81.1% compared to traditional methods. Our user study (N=16) showed that InstructPipe empowers novice users to streamline their workflow in creating desired ML pipelines, reduce their learning curve, and spark innovative ideas with open-ended commands.
View details
Experiencing Augmented Communication with Real-time Visuals using Large Language Models in Visual Captions
Xingyu 'Bruce' Liu
Vladimir Kirilyuk
Xiuxiu Yuan
Xiang ‘Anthony’ Chen
Adjunct Proceedings of the 36th Annual ACM Symposium on User Interface Software and Technology (UIST), ACM (2023) (to appear)
Experiencing Visual Blocks for ML: Visual Prototyping of AI Pipelines
Na Li
Jing Jin
Michelle Carney
Jun Jiang
Xiuxiu Yuan
Kristen Wright
Mark Sherwood
Jason Mayes
Lin Chen
Jingtao Zhou
Zhongyi Zhou
Ping Yu
Ram Iyengar
ACM (2023) (to appear)
Preview abstract
We demonstrate Visual Blocks for ML, a visual programming platform that facilitates rapid prototyping of ML-based multimedia applications. As the public version of Rapsai , we further integrated large language models and custom APIs into the platform. In this demonstration, we will showcase how to build interactive AI pipelines in a few drag-and-drops, how to perform interactive data augmentation, and how to integrate pipelines into Colabs. In addition, we demonstrate a wide range of community-contributed pipelines in Visual Blocks for ML, covering various aspects including interactive graphics, chains of large language models, computer vision, and multi-modal applications. Finally, we encourage students, designers, and ML practitioners to contribute ML pipelines through https://github.com/google/visualblocks/tree/main/pipelines to inspire creative use cases. Visual Blocks for ML is available at http://visualblocks.withgoogle.com.
View details
Visual Captions: Augmenting Verbal Communication with On-the-fly Visuals
Xingyu Bruce Liu
Vladimir Kirilyuk
Xiuxiu Yuan
Xiang ‘Anthony’ Chen
Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems (CHI), ACM, pp. 1-20
Preview abstract
Computer-mediated platforms are increasingly facilitating verbal communication, and capabilities such as live captioning and noise cancellation enable people to understand each other better. We envision that visual augmentations that leverage semantics in the spoken language could also be helpful to illustrate complex or unfamiliar concepts. To advance our understanding of the interest in such capabilities, we conducted formative research through remote interviews (N=10) and crowdsourced a dataset of 1500 sentence-visual pairs across a wide range of contexts.
These insights informed Visual Captions, a real-time system that we integrated into a videoconferencing platform to enrich verbal communication. Visual Captions leverages a fine-tuned large language model to proactively suggest relevant visuals in open-vocabulary conversations. We report on our findings from a lab study (N=26) and a two-week deployment study (N=10), which demonstrate how Visual Captions has the potential to help people improve their communication through visual augmentation in various scenarios.
View details