Adarsh Kowdle
I am an Engineering Director on Google's Augmented Reality team leading the efforts around geometric and human perception, working on end-to-end solutions from research to product at the intersection of real-time computer vision, geometric/human sensing and applied machine learning such as ARCore Depth API, Relightables. Previously at Google, I was the Hardware/Systems Lead for uDepth: real-time active depth sensing on Pixel 4 that powers Face Unlock and computational photography use cases such as bokeh. My areas of interest are computer vision and machine learning with a focus on real-time applications.
Previously, I was a Senior Scientist and part of the founding team at perceptiveIO, where I developed computer vision and machine learning algorithms for 3D sensing, visual recognition and human-computer interaction. Prior to this, I spent 3 years at Microsoft as a Senior SDE / Researcher in the Applied Vision and Imaging Team at Microsoft, where I worked on Surface Hub among other projects. I also worked with the Interactive 3D Technologies group at Microsoft Research at Redmond for 6 months on projects such as Holoportation.
I graduated with a PhD in Electrical and Computer Engineering from Cornell University in July 2013. I was advised by Prof. Tsuhan Chen. My thesis focus was on interactive computer vision algorithms and image based modeling; putting the user in the loop intelligently by leveraging the power of the automatic algorithm.
Google Scholar Page
Research Areas
Authored Publications
Sort By
Rapsai: Accelerating Machine Learning Prototyping of Multimedia Applications through Visual Programming
Na Li
Jing Jin
Michelle Carney
Scott Joseph Miles
Maria Kleiner
Xiuxiu Yuan
Anuva Kulkarni
Xingyu “Bruce” Liu
Ahmed K Sabie
Abhishek Kar
Ping Yu
Ram Iyengar
Alex Olwal
Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems (CHI), ACM
Preview abstract
In recent years, there has been a proliferation of multimedia applications that leverage machine learning (ML) for interactive experiences. Prototyping ML-based applications is, however, still challenging, given complex workflows that are not ideal for design and experimentation. To better understand these challenges, we conducted a formative study with seven ML practitioners to gather insights about common ML evaluation workflows.
This study helped us derive six design goals, which informed Rapsai, a visual programming platform for rapid and iterative development of end-to-end ML-based multimedia applications. Rapsai is based on a node-graph editor to facilitate interactive characterization and visualization of ML model performance. Rapsai streamlines end-to-end prototyping with interactive data augmentation and model comparison capabilities in its no-coding environment. Our evaluation of Rapsai in four real-world case studies (N=15) suggests that practitioners can accelerate their workflow, make more informed decisions, analyze strengths and weaknesses, and holistically evaluate model behavior with real-world input.
View details
Experiencing Visual Blocks for ML: Visual Prototyping of AI Pipelines
Na Li
Jing Jin
Michelle Carney
Jun Jiang
Xiuxiu Yuan
Kristen Wright
Mark Sherwood
Jason Mayes
Lin Chen
Jingtao Zhou
Zhongyi Zhou
Ping Yu
Ram Iyengar
Alex Olwal
ACM (2023) (to appear)
Preview abstract
We demonstrate Visual Blocks for ML, a visual programming platform that facilitates rapid prototyping of ML-based multimedia applications. As the public version of Rapsai , we further integrated large language models and custom APIs into the platform. In this demonstration, we will showcase how to build interactive AI pipelines in a few drag-and-drops, how to perform interactive data augmentation, and how to integrate pipelines into Colabs. In addition, we demonstrate a wide range of community-contributed pipelines in Visual Blocks for ML, covering various aspects including interactive graphics, chains of large language models, computer vision, and multi-modal applications. Finally, we encourage students, designers, and ML practitioners to contribute ML pipelines through https://github.com/google/visualblocks/tree/main/pipelines to inspire creative use cases. Visual Blocks for ML is available at http://visualblocks.withgoogle.com.
View details
Experiencing Rapid Prototyping of Machine Learning Based Multimedia Applications in Rapsai
Na Li
Jing Jin
Michelle Carney
Xiuxiu Yuan
Ping Yu
Ram Iyengar
Alex Olwal
CHI EA '23: Extended Abstracts of the 2023 CHI Conference on Human Factors in Computing Systems, ACM, 448:1-4
Preview abstract
We demonstrate Rapsai, a visual programming platform that aims to streamline the rapid and iterative development of end-to-end machine learning (ML)-based multimedia applications. Rapsai features a node-graph editor that enables interactive characterization and visualization of ML model performance, which facilitates the understanding of how the model behaves in different scenarios. Moreover, the platform streamlines end-to-end prototyping by providing interactive data augmentation and model comparison capabilities within a no-coding environment. Our demonstration showcases the versatility of Rapsai through several use cases, including virtual background, visual effects with depth estimation, and audio denoising. The implementation of Rapsai is intended to support ML practitioners in streamlining their workflow, making data-driven decisions, and comprehensively evaluating model behavior with real-world input.
View details
Experiencing Real-time 3D Interaction with Depth Maps for Mobile Augmented Reality in DepthLab
Maksym Dzitsiuk
Luca Prasso
Ivo Duarte
Jason Dourgarian
Joao Afonso
Jose Pascoal
Josh Gladstone
Nuno Moura e Silva Cruces
Shahram Izadi
Konstantine Nicholas John Tsotsos
Adjunct Publication of the 33rd Annual ACM Symposium on User Interface Software and Technology, ACM (2020), pp. 108-110
Preview abstract
We demonstrate DepthLab, a wide range of experiences using the ARCore Depth API that allows users to detect the shape and depth in the physical environment with a mobile phone. DepthLab encapsulates a variety of depth-based UI/UX paradigms, including geometry-aware rendering (occlusion, shadows, texture decals), surface interaction behaviors (physics, collision detection, avatar path planning), and visual effects (relighting, 3D-anchored focus and aperture effects, 3D photos). We have open-sourced our software at https://github.com/googlesamples/arcore-depth-lab to facilitate future research and development in depth-aware mobile AR experiences. With DepthLab, we aim to help mobile developers to effortlessly integrate depth into their AR experiences and amplify the expression of their creative vision.
View details
DepthLab: Real-time 3D Interaction with Depth Maps for Mobile Augmented Reality
Maksym Dzitsiuk
Luca Prasso
Ivo Duarte
Jason Dourgarian
Joao Afonso
Jose Pascoal
Josh Gladstone
Nuno Moura e Silva Cruces
Shahram Izadi
Konstantine Nicholas John Tsotsos
Proceedings of the 33rd Annual ACM Symposium on User Interface Software and Technology, ACM (2020), pp. 829-843
Preview abstract
Mobile devices with passive depth sensing capabilities are ubiquitous, and recently active depth sensors have become available on some tablets and VR/AR devices. Although real-time depth data is accessible, its rich value to mainstream AR applications has been sorely under-explored. Adoption of depth-based UX has been impeded by the complexity of performing even simple operations with raw depth data, such as detecting intersections or constructing meshes. In this paper, we introduce DepthLab, a software library that encapsulates a variety of depth-based UI/UX paradigms, including geometry-aware rendering (occlusion, shadows), surface interaction behaviors (physics-based collisions, avatar path planning), and visual effects (relighting, depth-of-field effects). We break down depth usage into localized depth, surface depth, and dense depth, and describe our real-time algorithms for interaction and rendering tasks. We present the design process, system, and components of DepthLab to streamline and centralize the development of interactive depth features. We have open-sourced our software to external developers, conducted performance evaluation, and discussed how DepthLab can accelerate the workflow of mobile AR designers and developers. We envision that DepthLab may help mobile AR developers amplify their prototyping efforts, empowering them to unleash their creativity and effortlessly integrate depth into mobile AR experiences.
View details
Deep Reflectance Fields - High-Quality Facial Reflectance Field Inference from Color Gradient Illumination
Abhi Meka
Christian Haene
Michael Zollhöfer
Graham Fyffe
Xueming Yu
Jason Dourgarian
Peter Denny
Sofien Bouaziz
Peter Lincoln
Matt Whalen
Geoff Harvey
Jonathan Taylor
Shahram Izadi
Paul Debevec
Christian Theobalt
Julien Valentin
Christoph Rhemann
SIGGRAPH (2019)
Preview abstract
Photo-realistic relighting of human faces is a highly sought after feature with many applications ranging from visual effects to truly immersive virtual experiences. Despite tremendous technological advances in the field, humans are often capable of distinguishing real faces from synthetic renders. Photo-realistically relighting any human face is indeed a challenge with many difficulties going from modelling sub-surface scattering and blood flow to estimating the interaction between light and individual strands of hair. We introduce the first system that combines the ability to deal with dynamic performances to the realism of 4D reflectance fields, enabling photo-realistic relighting of non-static faces. The core of our method consists of a Deep Neural network that is able to predict full 4D reflectance fields from two images captured under spherical gradient illumination. Extensive experiments not only show that two images under spherical gradient illumination can be easily captured in real time, but also that these particular images contain all the information needed to estimate the full reflectance field, including specularities and high frequency details. Finally, side by side comparisons demonstrate that the proposed system outperforms the current state-of-the-art in terms of realism and speed.
View details
The Relightables: Volumetric Performance Capture of Humans with Realistic Relighting
Kaiwen Guo
Peter Lincoln
Philip Davidson
Xueming Yu
Matt Whalen
Geoff Harvey
Jason Dourgarian
Danhang Tang
Anastasia Tkach
Emily Cooper
Mingsong Dou
Graham Fyffe
Christoph Rhemann
Jonathan Taylor
Paul Debevec
Shahram Izadi
SIGGRAPH Asia (2019) (to appear)
Preview abstract
We present ''The Relightables'', a volumetric capture system for photorealistic and high quality relightable full-body performance capture. While significant progress has been made on volumetric capture systems, focusing on 3D geometric reconstruction with high resolution textures, much less work has been done to recover photometric properties needed for relighting. Results from such systems lack high-frequency details and the subject's shading is prebaked into the texture. In contrast, a large body of work has addressed relightable acquisition for image-based approaches, which photograph the subject under a set of basis lighting conditions and recombine the images to show the subject as they would appear in a target lighting environment. However, to date, these approaches have not been adapted for use in the context of a high-resolution volumetric capture system. Our method combines this ability to realistically relight humans for arbitrary environments, with the benefits of free-viewpoint volumetric capture and new levels of geometric accuracy for dynamic performances. Our subjects are recorded inside a custom geodesic sphere outfitted with 331 custom color LED lights, an array of high-resolution cameras, and a set of custom high-resolution depth sensors. Our system innovates in multiple areas: First, we designed a novel active depth sensor to capture 12.4MP depth maps, which we describe in detail. Second, we show how to design a hybrid geometric and machine learning reconstruction pipeline to process the high resolution input and output a volumetric video. Third, we generate temporally consistent reflectance maps for dynamic performers by leveraging the information contained in two alternating color gradient illumination images acquired at 60Hz. Multiple experiments, comparisons, and applications show that The Relightables significantly improves upon the level of realism in placing volumetrically captured human performances into arbitrary CG scenes.
View details
The Need 4 Speed in Real-Time Dense Visual Tracking
Christoph Rhemann
Jonathan Taylor
Philip Davidson
Mingsong Dou
Kaiwen Guo
Cem Keskin
Sameh Khamis
Danhang Tang
Vladimir Tankovich
Julien Valentin
Shahram Izadi
SIGGRAPH Asia (2018)
Preview abstract
The advent of consumer depth cameras has incited the development of a new cohort of algorithms tackling challenging computer vision problems. The primary reason is that depth provides direct geometric information that is largely invariant to texture and illumination. As such, substantial progress has been made in human and object pose estimation, 3D reconstruction and simultaneous localization and mapping. Most of these algorithms naturally benefit from the ability to accurately track the pose of an object or scene of interest from one frame to the next. However, commercially available depth sensors (typically running at 30fps) can allow for large inter-frame motions to occur that make such tracking problematic. A high frame rate depth camera would thus greatly ameliorate these issues, and further increase the tractability of these computer vision problems. Nonetheless, the depth accuracy of recent systems for high-speed depth estimation [Fanello et al. 2017b] can degrade at high frame rates. This is because the active illumination employed produces a low SNR and thus a high exposure time is required to obtain a dense accurate depth image. Furthermore in the presence of rapid motion, longer exposure times produce artifacts due to motion blur, and necessitates a lower frame rate that introduces large inter-frame motion that often yield tracking failures. In contrast, this paper proposes a novel combination of hardware and software components that avoids the need to compromise between a dense accurate depth map and a high frame rate. We document the creation of a full 3D capture system for high speed and quality depth estimation, and demonstrate its advantages in a variety of tracking and reconstruction tasks. We extend the state of the art active stereo algorithm presented in Fanello et al. [2017b] by adding a space-time feature in the matching phase. We also propose a machine learning based depth refinement step that is an order of magnitude faster than traditional postprocessing methods. We quantitatively and qualitatively demonstrate the benefits of the proposed algorithms in the acquisition of geometry in motion. Our pipeline executes in 1.1ms leveraging modern GPUs and off-the-shelf cameras and illumination components. We show how the sensor can be employed in many different applications, from [non-]rigid reconstructions to hand/face tracking. Further, we show many advantages over existing state of the art depth camera technologies beyond framerate, including latency, motion artifacts, multi-path errors, and multi-sensor interference.
View details
UltraFast 3D Sensing, Reconstruction and Understanding of People, Objects, and Environments
Anastasia Tkach
Christine Kaeser-Chen
Christoph Rhemann
Jonathan Taylor
Julien Valentin
Kaiwen Guo
Mingsong Dou
Sameh Khamis
Shahram Izadi
Sofien Bouaziz
Thomas Funkhouser
Yinda Zhang
Preview abstract
This is a set of slide decks presenting a full tutorial on 3D capture and reconstruction, with high-level applications on VR and AR. This request is to upload the slides on the tutorial website:
https://augmentedperception.github.io/cvpr18/
View details
Real-time Compression and Streaming of 4D Performances
Danhang Tang
Mingsong Dou
Peter Lincoln
Philip Davidson
Kaiwen Guo
Jonathan Taylor
Cem Keskin
Sofien Bouaziz
Shahram Izadi
ACM Transaction of Graphics (2018)
Preview abstract
We introduce a realtime compression architecture for 4D performance capture that is two orders of magnitude faster than current state-of-the-art techniques, yet achieves comparable visual quality and bitrate. We note how much of the algorithmic complexity in traditional 4D compression arises from the necessity to encode geometry in a explicit model (i.e. a triangle mesh). In contrast, we propose an encoder that leverages implicit model to represent the observed geometry and its changes through time
View details