Adarsh Kowdle
I am an Engineering Director on Google's Augmented Reality team leading the efforts around geometric and human perception, working on end-to-end solutions from research to product at the intersection of real-time computer vision, geometric/human sensing and applied machine learning such as ARCore Depth API, Relightables. Previously at Google, I was the Hardware/Systems Lead for uDepth: real-time active depth sensing on Pixel 4 that powers Face Unlock and computational photography use cases such as bokeh. My areas of interest are computer vision and machine learning with a focus on real-time applications.
Previously, I was a Senior Scientist and part of the founding team at perceptiveIO, where I developed computer vision and machine learning algorithms for 3D sensing, visual recognition and human-computer interaction. Prior to this, I spent 3 years at Microsoft as a Senior SDE / Researcher in the Applied Vision and Imaging Team at Microsoft, where I worked on Surface Hub among other projects. I also worked with the Interactive 3D Technologies group at Microsoft Research at Redmond for 6 months on projects such as Holoportation.
I graduated with a PhD in Electrical and Computer Engineering from Cornell University in July 2013. I was advised by Prof. Tsuhan Chen. My thesis focus was on interactive computer vision algorithms and image based modeling; putting the user in the loop intelligently by leveraging the power of the automatic algorithm.
Google Scholar Page
Research Areas
Authored Publications
Sort By
Experiencing Visual Blocks for ML: Visual Prototyping of AI Pipelines
Na Li
Jing Jin
Michelle Carney
Jun Jiang
Xiuxiu Yuan
Kristen Wright
Mark Sherwood
Jason Mayes
Lin Chen
Jingtao Zhou
Zhongyi Zhou
Ping Yu
Ram Iyengar
ACM (2023) (to appear)
Preview abstract
We demonstrate Visual Blocks for ML, a visual programming platform that facilitates rapid prototyping of ML-based multimedia applications. As the public version of Rapsai , we further integrated large language models and custom APIs into the platform. In this demonstration, we will showcase how to build interactive AI pipelines in a few drag-and-drops, how to perform interactive data augmentation, and how to integrate pipelines into Colabs. In addition, we demonstrate a wide range of community-contributed pipelines in Visual Blocks for ML, covering various aspects including interactive graphics, chains of large language models, computer vision, and multi-modal applications. Finally, we encourage students, designers, and ML practitioners to contribute ML pipelines through https://github.com/google/visualblocks/tree/main/pipelines to inspire creative use cases. Visual Blocks for ML is available at http://visualblocks.withgoogle.com.
View details
Experiencing Rapid Prototyping of Machine Learning Based Multimedia Applications in Rapsai
Na Li
Jing Jin
Michelle Carney
Xiuxiu Yuan
Ping Yu
Ram Iyengar
CHI EA '23: Extended Abstracts of the 2023 CHI Conference on Human Factors in Computing Systems, ACM, 448:1-4
Preview abstract
We demonstrate Rapsai, a visual programming platform that aims to streamline the rapid and iterative development of end-to-end machine learning (ML)-based multimedia applications. Rapsai features a node-graph editor that enables interactive characterization and visualization of ML model performance, which facilitates the understanding of how the model behaves in different scenarios. Moreover, the platform streamlines end-to-end prototyping by providing interactive data augmentation and model comparison capabilities within a no-coding environment. Our demonstration showcases the versatility of Rapsai through several use cases, including virtual background, visual effects with depth estimation, and audio denoising. The implementation of Rapsai is intended to support ML practitioners in streamlining their workflow, making data-driven decisions, and comprehensively evaluating model behavior with real-world input.
View details
Rapsai: Accelerating Machine Learning Prototyping of Multimedia Applications through Visual Programming
Na Li
Jing Jin
Michelle Carney
Scott Joseph Miles
Maria Kleiner
Xiuxiu Yuan
Anuva Kulkarni
Xingyu “Bruce” Liu
Ahmed K Sabie
Abhishek Kar
Ping Yu
Ram Iyengar
Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems (CHI), ACM
Preview abstract
In recent years, there has been a proliferation of multimedia applications that leverage machine learning (ML) for interactive experiences. Prototyping ML-based applications is, however, still challenging, given complex workflows that are not ideal for design and experimentation. To better understand these challenges, we conducted a formative study with seven ML practitioners to gather insights about common ML evaluation workflows.
This study helped us derive six design goals, which informed Rapsai, a visual programming platform for rapid and iterative development of end-to-end ML-based multimedia applications. Rapsai is based on a node-graph editor to facilitate interactive characterization and visualization of ML model performance. Rapsai streamlines end-to-end prototyping with interactive data augmentation and model comparison capabilities in its no-coding environment. Our evaluation of Rapsai in four real-world case studies (N=15) suggests that practitioners can accelerate their workflow, make more informed decisions, analyze strengths and weaknesses, and holistically evaluate model behavior with real-world input.
View details
DepthLab: Real-time 3D Interaction with Depth Maps for Mobile Augmented Reality
Maksym Dzitsiuk
Luca Prasso
Ivo Duarte
Jason Dourgarian
Joao Afonso
Jose Pascoal
Josh Gladstone
Nuno Moura e Silva Cruces
Shahram Izadi
Konstantine Nicholas John Tsotsos
Proceedings of the 33rd Annual ACM Symposium on User Interface Software and Technology, ACM (2020), pp. 829-843
Preview abstract
Mobile devices with passive depth sensing capabilities are ubiquitous, and recently active depth sensors have become available on some tablets and VR/AR devices. Although real-time depth data is accessible, its rich value to mainstream AR applications has been sorely under-explored. Adoption of depth-based UX has been impeded by the complexity of performing even simple operations with raw depth data, such as detecting intersections or constructing meshes. In this paper, we introduce DepthLab, a software library that encapsulates a variety of depth-based UI/UX paradigms, including geometry-aware rendering (occlusion, shadows), surface interaction behaviors (physics-based collisions, avatar path planning), and visual effects (relighting, depth-of-field effects). We break down depth usage into localized depth, surface depth, and dense depth, and describe our real-time algorithms for interaction and rendering tasks. We present the design process, system, and components of DepthLab to streamline and centralize the development of interactive depth features. We have open-sourced our software to external developers, conducted performance evaluation, and discussed how DepthLab can accelerate the workflow of mobile AR designers and developers. We envision that DepthLab may help mobile AR developers amplify their prototyping efforts, empowering them to unleash their creativity and effortlessly integrate depth into mobile AR experiences.
View details
Experiencing Real-time 3D Interaction with Depth Maps for Mobile Augmented Reality in DepthLab
Maksym Dzitsiuk
Luca Prasso
Ivo Duarte
Jason Dourgarian
Joao Afonso
Jose Pascoal
Josh Gladstone
Nuno Moura e Silva Cruces
Shahram Izadi
Konstantine Nicholas John Tsotsos
Adjunct Publication of the 33rd Annual ACM Symposium on User Interface Software and Technology, ACM (2020), pp. 108-110
Preview abstract
We demonstrate DepthLab, a wide range of experiences using the ARCore Depth API that allows users to detect the shape and depth in the physical environment with a mobile phone. DepthLab encapsulates a variety of depth-based UI/UX paradigms, including geometry-aware rendering (occlusion, shadows, texture decals), surface interaction behaviors (physics, collision detection, avatar path planning), and visual effects (relighting, 3D-anchored focus and aperture effects, 3D photos). We have open-sourced our software at https://github.com/googlesamples/arcore-depth-lab to facilitate future research and development in depth-aware mobile AR experiences. With DepthLab, we aim to help mobile developers to effortlessly integrate depth into their AR experiences and amplify the expression of their creative vision.
View details
The Relightables: Volumetric Performance Capture of Humans with Realistic Relighting
Kaiwen Guo
Peter Lincoln
Philip Davidson
Xueming Yu
Matt Whalen
Geoff Harvey
Jason Dourgarian
Danhang Tang
Anastasia Tkach
Emily Cooper
Mingsong Dou
Graham Fyffe
Christoph Rhemann
Jonathan Taylor
Paul Debevec
Shahram Izadi
SIGGRAPH Asia (2019) (to appear)
Preview abstract
We present ''The Relightables'', a volumetric capture system for photorealistic and high quality relightable full-body performance capture. While significant progress has been made on volumetric capture systems, focusing on 3D geometric reconstruction with high resolution textures, much less work has been done to recover photometric properties needed for relighting. Results from such systems lack high-frequency details and the subject's shading is prebaked into the texture. In contrast, a large body of work has addressed relightable acquisition for image-based approaches, which photograph the subject under a set of basis lighting conditions and recombine the images to show the subject as they would appear in a target lighting environment. However, to date, these approaches have not been adapted for use in the context of a high-resolution volumetric capture system. Our method combines this ability to realistically relight humans for arbitrary environments, with the benefits of free-viewpoint volumetric capture and new levels of geometric accuracy for dynamic performances. Our subjects are recorded inside a custom geodesic sphere outfitted with 331 custom color LED lights, an array of high-resolution cameras, and a set of custom high-resolution depth sensors. Our system innovates in multiple areas: First, we designed a novel active depth sensor to capture 12.4MP depth maps, which we describe in detail. Second, we show how to design a hybrid geometric and machine learning reconstruction pipeline to process the high resolution input and output a volumetric video. Third, we generate temporally consistent reflectance maps for dynamic performers by leveraging the information contained in two alternating color gradient illumination images acquired at 60Hz. Multiple experiments, comparisons, and applications show that The Relightables significantly improves upon the level of realism in placing volumetrically captured human performances into arbitrary CG scenes.
View details
Deep Reflectance Fields - High-Quality Facial Reflectance Field Inference from Color Gradient Illumination
Abhi Meka
Christian Haene
Michael Zollhöfer
Graham Fyffe
Xueming Yu
Jason Dourgarian
Peter Denny
Sofien Bouaziz
Peter Lincoln
Matt Whalen
Geoff Harvey
Jonathan Taylor
Shahram Izadi
Paul Debevec
Christian Theobalt
Julien Valentin
Christoph Rhemann
SIGGRAPH (2019)
Preview abstract
Photo-realistic relighting of human faces is a highly sought after feature with many applications ranging from visual effects to truly immersive virtual experiences. Despite tremendous technological advances in the field, humans are often capable of distinguishing real faces from synthetic renders. Photo-realistically relighting any human face is indeed a challenge with many difficulties going from modelling sub-surface scattering and blood flow to estimating the interaction between light and individual strands of hair. We introduce the first system that combines the ability to deal with dynamic performances to the realism of 4D reflectance fields, enabling photo-realistic relighting of non-static faces. The core of our method consists of a Deep Neural network that is able to predict full 4D reflectance fields from two images captured under spherical gradient illumination. Extensive experiments not only show that two images under spherical gradient illumination can be easily captured in real time, but also that these particular images contain all the information needed to estimate the full reflectance field, including specularities and high frequency details. Finally, side by side comparisons demonstrate that the proposed system outperforms the current state-of-the-art in terms of realism and speed.
View details
SOS: Stereo Matching in O(1) with Slanted Support Windows
Vladimir Tankovich
Michael John Schoenberg
Christoph Rhemann
Mirko Schmidt
Maksym Dzitsiuk
Julien Valentin
Shahram Izadi
IROS (2018)
Preview abstract
Depth cameras have accelerated research in many areas of computer vision. Most triangulation-based depth cameras, whether structured light systems like the Kinect or active (assisted) stereo systems, are based on the principle of stereo matching. Depth from stereo is an active research topic dating back 30 years. Despite recent advances, algorithms usually trade-off accuracy for speed. In particular, efficient methods rely on fronto-parallel assumptions to reduce the search space and keep computation low. We present SOS (Slanted O (1) Stereo), the first algorithm capable of leveraging slanted support windows without sacrificing speed or accuracy. We use an active stereo configuration, where an illuminator textures the scene. Under this setting, local methods-such as PatchMatch Stereo-obtain state of the art results by jointly estimating disparities and slant, but at a large computational cost. We observe that these methods typically exploit local smoothness to simplify their initialization strategies. Our key insight is that local smoothness can in fact be used to amortize the computation not only within initialization, but across the entire stereo pipeline. Building on these insights, we propose a novel hierarchical initialization that is able to efficiently perform search over disparity and slants. We then show how this structure can be leveraged to provide high quality depth maps. Extensive quantitative evaluations demonstrate that the proposed technique yields significantly more precise results than current state of the art, but at a fraction of the computational cost. Our prototype implementation runs at 4000 fps on modern GPU architectures.
View details
Depth from motion for smartphone AR
Julien Valentin
Neal Wadhwa
Max Dzitsiuk
Michael John Schoenberg
Vivek Verma
Ambrus Csaszar
Ivan Dryanovski
Joao Afonso
Jose Pascoal
Konstantine Nicholas John Tsotsos
Mira Angela Leung
Mirko Schmidt
Sameh Khamis
Vladimir Tankovich
Shahram Izadi
Christoph Rhemann
ACM Transactions on Graphics (2018)
Preview abstract
Augmented reality (AR) for smartphones has matured from a technology for earlier adopters, available only on select high-end phones, to one that is truly available to the general public. One of the key breakthroughs has been in low-compute methods for six degree of freedom (6DoF) tracking on phones using only the existing hardware (camera and inertial sensors). 6DoF tracking is the cornerstone of smartphone AR allowing virtual content to be precisely locked on top of the real world. However, to really give users the impression of believable AR, one requires mobile depth. Without depth, even simple effects such as a virtual object being correctly occluded by the real-world is impossible. However, requiring a mobile depth sensor would severely restrict the access to such features. In this article, we provide a novel pipeline for mobile depth that supports a wide array of mobile phones, and uses only the existing monocular color sensor. Through several technical contributions, we provide the ability to compute low latency dense depth maps using only a single CPU core of a wide range of (medium-high) mobile phones. We demonstrate the capabilities of our approach on high-level AR applications including real-time navigation and shopping.
View details
ActiveStereoNet: Unsupervised End-to-End Learning for Active Stereo Systems
Yinda Zhang
Sameh Khamis
Christoph Rhemann
Julien Valentin
Vladimir Tankovich
Michael Schoenberg
Shahram Izadi
European Conference on Computer Vision (2018)
Preview abstract
In this paper we present ActiveStereoNet, the first deep learning solution for active stereo systems. Due to the lack of ground truth, our method is fully self-supervised, yet it produces precise depth with a subpixel precision of 1/30th of a pixel; it does not suffer from the common over-smoothing issues; it preserves the edges; and it explicitly handles occlusions. We introduce a novel reconstruction loss that is more robust to noise and texture-less patches, and is invariant to illumination changes. The proposed loss is optimized using a window-based cost aggregation with an adaptive support weight scheme. This cost aggregation is edge-preserving and smooths the loss function, which is key to allow the network to reach compelling results. Finally we show how the task of predicting invalid regions, such as occlusions, can be trained end-to-end without ground-truth. This component is crucial to reduce blur and particularly improves predictions along depth discontinuities. Extensive
quantitatively and qualitatively evaluations on real and synthetic data demonstrate state of the art results in many challenging scenes.
View details