Abhishek Kar
I am currently a Research Scientist in the Augmented Reality team at Google where I work on problems at the intersection of 3D computer vision and machine learning.
Prior to Google, I was the Machine Learning Lead at Fyusion Inc., a 3D computational photography startup based in San Francisco. I graduated from UC Berkeley in 2017 from Jitendra Malik's group working on Machine Learning and 3D Computer Vision. I have also spent time at Microsoft Research working on viewing large imagery on mobile devices and with the awesome team at Fyusion capturing "3D photos" with mobile devices and developing deep learning models for them. Some features I have shipped/worked on at Fyusion include 3D visual search, creation of user generated AR/VR content, real-time style transfer on mobile devices and automatic damage analysis on cars.
Research Areas
Authored Publications
Google Publications
Other Publications
Sort By
LU-NeRF: Scene and Pose Estimation by Synchronizing Local Unposed NeRFs
Zezhou Cheng
Varun Jampani
Subhransu Maji
International Conference on Computer Vision (ICCV) (2023)
Preview abstract
A critical obstacle preventing NeRF models from being deployed broadly in the wild is their reliance on accurate camera poses. Consequently, there is growing interest in extending NeRF models to jointly optimize camera poses and scene representation, which offers an alternative to offthe-shelf SfM pipelines which have well-understood failure modes. Existing approaches for unposed NeRF operate under limiting assumptions, such as a prior pose distribution or coarse pose initialization, making them less effective in a general setting. In this work, we propose a novel approach, LU-NeRF, that jointly estimates camera poses and neural radiance fields with relaxed assumptions on pose configuration. Our approach operates in a local-to-global manner, where we first optimize over local subsets of the data, dubbed “mini-scenes.” LU-NeRF estimates local pose and geometry for this challenging few-shot task. The mini-scene
poses are brought into a global reference frame through a robust pose synchronization step, where a final global optimization of pose and scene can be performed. We show our LU-NeRF pipeline outperforms prior attempts at unposed NeRF without making restrictive assumptions on the pose prior. This allows us to operate in the general SE(3) pose setting, unlike the baselines. Our results also indicate our model can be complementary to feature-based SfM
pipelines as it compares favorably to COLMAP on lowtexture and low-resolution images.
View details
Rapsai: Accelerating Machine Learning Prototyping of Multimedia Applications through Visual Programming
Na Li
Jing Jin
Michelle Carney
Scott Joseph Miles
Maria Kleiner
Xiuxiu Yuan
Anuva Kulkarni
Xingyu “Bruce” Liu
Ahmed K Sabie
Ping Yu
Ram Iyengar
Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems (CHI), ACM
Preview abstract
In recent years, there has been a proliferation of multimedia applications that leverage machine learning (ML) for interactive experiences. Prototyping ML-based applications is, however, still challenging, given complex workflows that are not ideal for design and experimentation. To better understand these challenges, we conducted a formative study with seven ML practitioners to gather insights about common ML evaluation workflows.
This study helped us derive six design goals, which informed Rapsai, a visual programming platform for rapid and iterative development of end-to-end ML-based multimedia applications. Rapsai is based on a node-graph editor to facilitate interactive characterization and visualization of ML model performance. Rapsai streamlines end-to-end prototyping with interactive data augmentation and model comparison capabilities in its no-coding environment. Our evaluation of Rapsai in four real-world case studies (N=15) suggests that practitioners can accelerate their workflow, make more informed decisions, analyze strengths and weaknesses, and holistically evaluate model behavior with real-world input.
View details
ASIC: Aligning Sparse in-the-wild Image Collections
Kamal Gupta
Varun Jampani
Abhinav Shrivastava
International Conference on Computer Vision (ICCV) (2023)
Preview abstract
We present a method for joint alignment of sparse in-thewild image collections of an object category. Most prior works assume either ground-truth keypoint annotations or a large dataset of images of a single object category. However, neither of the above assumptions hold true for the longtail of the objects present in the world. We present a selfsupervised technique that directly optimizes on a sparse collection of images of a particular object/object category to obtain consistent dense correspondences across the collection. We use pairwise nearest neighbors obtained from deep features of a pre-trained vision transformer (ViT) model as noisy and sparse keypoint matches and make them dense and accurate matches by optimizing a neural network that jointly maps the image collection into a learned canonical grid. Experiments on CUB and SPair-71k benchmarks demonstrate that our method can produce globally consistent and higher quality correspondences across the image collection when compared to existing self-supervised methods. Code and other material will be made available at https://kampta.github.io/asic.
View details
SLIDE: Single Image 3D Photography with Soft Layering and Depth-aware Inpainting
Varun Jampani*
Huiwen Chang*
Kyle Gregory Sargent
Dominik Philemon Kaeser
Ce Liu
ICCV 2021 (2021)
Preview abstract
Single image 3D photography enables viewers to view a still image from novel viewpoints. Recent approaches for single-image view synthesis combine monocular depth network along with inpainting networks resulting in compelling novel view synthesis results. A drawback of these approaches is the use of hard layering making them not suitable to model intricate appearance effects such as matting. We present SLIDE, a modular and unified system for single image 3D photography that uses simple yet effective soft layering strategy to model appearance effects. In addition, we propose a novel depth-aware training of inpainting network suitable for 3D photography task. Extensive experimental analysis on 3 different view synthesis datasets in combination with user studies on in-the-wild image collections demonstrate the superior performance of our technique in comparison to existing strong baselines.
View details
Free-Viewpoint Facial Re-Enactment from a Casual Capture
Srinivas Rao
Rodrigo Ortiz-Cayon
Matteo Munaro
Aidas Liaudanskas
Krunal Chande
Tobias Bertel
Christian Richardt
Alexander JB Trevor
Stefan Holzer
SIGGRAPH Asia 2020 Posters, Association for Computing Machinery, Virtual Event, Republic of Korea
Preview abstract
We propose a system for free-viewpoint facial re-enactment from a casual video capture of a target subject. Our system can render and re-enact the subject consistently in all the captured views. Furthermore, our system also enables interactive free-viewpoint facial re-enactment of the target from novel views. The re-enactment of the target subject is driven by an expression sequence of a source subject, which is captured using a custom app running on an iPhone X. Our system handles large pose variations in the target subject while keeping the re-enactment consistent. We demonstrate the efficacy of our system by showing various applications.
View details
Learning Independent Object Motion from Unlabelled Stereoscopic Videos
Zhe Cao
Christian Häne
Jitendra Malik
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2019)
Preview abstract
We present a system for learning motion maps of independently moving objects from stereo videos. The only annotations used in our system are 2D object bounding boxes which introduce the notion of objects in our system. Unlike prior learning based approaches which have focused on predicting dense optical flow fields and/or depth maps for images, we propose to predict instance specific 3D scene flow maps and instance masks from which we derive a factored 3D motion map for each object instance. Our network takes the 3D geometry of the problem into account which allows it to correlate the input images and distinguish moving objects from static ones. We present experiments evaluating the accuracy of our 3D flow vectors, as well as depth maps and projected 2D optical flow where our jointly learned system outperforms earlier approaches trained for each task independently.
View details
No Results Found