Arkanath Pathak
Research Areas
Authored Publications
Sort By
Sequential Training of GANs Against GAN-Classifiers Reveals Correlated “Knowledge Gaps” Present Among Independently Trained GAN Instances
Nick Dufour
The IEEE/CVF Conference on Computer Vision and Pattern Recognition 2023 (2023)
Preview abstract
Modern Generative Adversarial Networks (GANs) generate realistic images remarkably well. Previous work has demonstrated the feasibility of “GAN-classifiers” that are distinct from the co-trained discriminator, and operate on images generated from a frozen GAN. That such classifiers work at all affirms the existence of “knowledge gaps” (out-of-distribution artifacts across samples) present in GAN training. We iteratively train GAN-classifiers and train GANs that “fool” the classifiers (in an attempt to fill the knowledge gaps), and examine the effect on GAN training dynamics, output quality, and GAN-classifier generalization. We investigate two settings, a small DCGAN architecture trained on low dimensional images (MNIST), and StyleGAN2, a SOTA GAN architecture trained on high dimensional images (FFHQ). We find that the DCGAN is unable to effectively fool a held-out GAN-classifier without compromising the output quality. However, StyleGAN2 can fool held-out classifiers with no change in output quality, and this effect persists over multiple rounds of GAN/classifier training which appears to reveal an ordering over optima in the generator parameter space. Finally, we study different classifier architectures and show that the architecture of the GAN-classifier has a strong influence on the set of its learned artifacts.
View details
Preview abstract
Predicting future video frames is extremely challenging, as there are many factors of variation that make up the dynamics of how frames change through time. Previously proposed solutions require complex network architectures and highly specialized computation, including segmentation masks, optical flow, and foreground and background separation. In this work, we question if such handcrafted architectures are necessary and instead propose a different approach: maximizing the capacity of a standard convolutional neural network. We perform the first large-scale empirical study of the effect of capacity on video prediction models. In our experiments, we demonstrate our results on three different datasets: one for modeling object interactions, one for modeling human motion, and one for modeling first-person car driving.
View details
Learning 6-DOF Grasping Interaction via Deep 3D Geometry-aware Representations
Xinchen Yan
Mohi Khansari
Abhinav Gupta
James Davidson
Honglak Lee
(2018)
Preview abstract
This paper focuses on the problem of learning 6-DOF grasping with a parallel jaw gripper in simulation. Compared to existing approaches that are specialized in three-dimensional grasping (i.e., top-down grasping or side-grasping), using a 6-DOF grasping model allows the robot to learn a richer set of grasping interactions given less physical constraints; hence, potentially enhancing the robustness of grasping and robot dexterity. However, learning 6-DOF grasping is challenging due to a high dimensional state space, difficulty in collecting large-scale data, and many variations of an object’s visual appearance (i.e., geometry, material, texture, and illumination). We propose the notion of a geometry-aware representation in grasping based on the assumption that knowledge of 3D geometry is at the heart of interaction. Our key idea is constraining and regularizing grasping interaction learning through 3D geometry prediction.
View details