Sujoy Paul
I am a Research Scientist at Google working on the intersection of computer vision and machine learning, with specific interests in self-supervised learning, semantic segmentation, domain adaptation, image generation, video and scene analysis. My interests also include reinforcement learning for robotics, along with imitation learning and policy adaptation. Prior to joining Google, I did my PhD from University of California, Riverside on learning from limited supervision for static and dynamic tasks in computer vision and robotics.
Research Areas
Authored Publications
Sort By
Test-time Adaptation with Slot-centric Models
Mihir Prabhudesai
Anirudh Goyal
Gaurav Aggarwal
Thomas Kipf
Deepak Pathak
Katerina Fragkiadaki
International Conference on Machine Learning (2023), pp. 28151-28166
Preview abstract
Current visual detectors, though impressive within their training distribution, often fail to
parse out-of-distribution scenes into their constituent entities. Recent test-time adaptation methods use auxiliary self-supervised losses to adapt the network parameters to each test example independently and have shown promising results towards generalization outside the training distribution for the task of image classification. In our work, we find evidence that these losses are insufficient for the task of scene decomposition, without also considering architectural inductive biases. Recent slot-centric generative models attempt to decompose scenes into entities in a self-supervised manner by reconstructing pixels. Drawing upon these two lines of work, we propose Slot-TTA, a semi-supervised slot-centric scene decomposition model that at test time is adapted per scene through gradient descent on reconstruction or cross-view synthesis objectives. We evaluate Slot-TTA across multiple input modalities, images or 3D point clouds, and show substantial out-of-distribution performance improvements against state-of-the-art supervised feed-forward detectors, and alternative test-time adaptation methods. Project Webpage: http://slot-tta.github.io/
View details
What can we do with just the model? A simple knowledge extraction framework
Ansh Khurana
Gaurav Aggarwal
International Conference on Machine Learning. Principles of Distribution Shift Workshop (2022) (to appear)
Preview abstract
We consider the problem of adapting semantic segmentation models to new target domains, only from the trained source model, without the source data. Not only is this setting much harder than if one had access to the source data, this is necessary in many practical situations where source data is not available due to privacy and storage reasons. Our algorithm has two parts - first, we update that normalization statistics which helps to compensate for the distribution shift and second, we transfer knowledge from the source models adhering to certain equivariant and invariant transforms. The transforms helps to efficiently extract the knowledge beyond vanilla self-training. Through extensive experiments on multiple semantic segmentation tasks, we show how such a simple framework can be effective in extracting knowledge from the source model, for a variety of problem settings, and performs much better or at par with current state-of-the- art methods which are specifically tuned for the respective settings.
View details
Novel Class Discovery without Forgetting
Joseph K J
Gaurav Aggarwal
Soma Biswas
Piyush Rai
Kai Han
Vineeth N Balasubramanian
European Conference on Computer Vision (ECCV) (2022)
Preview abstract
Humans possess an innate ability to identify and differentiate instances that they are not familiar with, by leveraging and adapting the knowledge that they have acquired so far. Importantly, they achieve this without deteriorating the performance on their earlier learning. Inspired by this, we identify and formulate a new, pragmatic problem setting of NCDwF: Novel Class Discovery without Forgetting, which tasks a machine learning model to incrementally discover novel categories of instances from unlabeled data, while maintaining its performance on the previously seen categories. We propose 1) a method to generate pseudo latent representations which act as a proxy for (no longer available) labeled data, thereby alleviating forgetting, 2) a mutual-information based regularizer which enhances unsupervised discovery of novel classes, and 3) a simple Known Class Identifier which aids generalized inference when
the testing data contains instances form both seen and unseen categories. We introduce experimental protocols based on CIFAR-10, CIFAR-100 and ImageNet-1000 to measure the trade-off between knowledge retention and novel class discovery. Our extensive evaluations reveal that existing models catastrophically forget previously seen categories while identifying novel categories, while our method is able to effectively balance between the competing objectives. We hope our work will attract further
research into this newly identified pragmatic problem setting.
View details
Spacing Loss for Discovering Novel Categories
Joseph K J
Gaurav Aggarwal
Soma Biswas
Piyush Rai
Kai Han
Vineeth N Balasubramanian
Computer Vision and Pattern Recognition (CVPR) Workshop on Continual Learning in Computer Vision (2022)
Preview abstract
Novel Class Discovery (NCD) is a learning paradigm, where a machine learning model is tasked to semantically group instances from unlabeled data, by utilizing labeled instances from a disjoint set of classes. In this work, we first characterize existing NCD approaches into single-stage and two-stage methods based on whether they require access to labeled and unlabeled data together while discovering new classes.Next, we devise a simple yet powerful loss function that enforces separability in the latent space using cues from multi-dimensional scaling, which we refer to as Spacing Loss.
Our proposed formulation can either operate as a standalone method or can be plugged into existing methods to enhance them. We validate the efficacy of Spacing Loss with thorough experimental evaluation across multiple settings on CIFAR-10 and CIFAR-100 datasets.
View details
Cross-domain Imitation from Observations
Dripta S Raychaudhuri
Jeroen van Baar
Amit K Roy-Chowdhury
International Conference on Machine Learning (ICML), Proceedings of Machine Learning Research (PMLR) (2021)
Preview abstract
Imitation learning seeks to circumvent the difficulty in designing proper reward functions for
training agents by utilizing expert behavior. With environments modeled as Markov Decision Processes (MDP), most of the existing imitation algorithms are contingent on the availability of expert demonstrations in the same MDP as the one in which a new imitation policy is to be learned.
In this paper, we study the problem of how to imitate tasks when there exists discrepancies between the expert and agent MDP. These discrepancies across domains could include differing dynamics, viewpoint or morphology; we present a novel framework to learn correspondences across
such domains. Importantly, in contrast to prior works, we use unpaired and unaligned trajectories containing only states in the expert domain, to learn this correspondence. We utilize a cycleconsistency constraint on both the state space and a domain agnostic latent space to do this. In addition, we enforce consistency on the temporal position of states via a normalized position estimator function, to align the trajectories across the two domains. Once this correspondence is found,
we can directly transfer the demonstrations on one domain to the other and use it for imitation.
Experiments across a wide-variety of challenging domains demonstrate the efficacy of our approach.
View details