Deqing Sun
I am a staff research scientist at Google working on computer vision and machine learning. I received my Ph.D. in Computer Science from Brown University, M.Phil. in Electronic Engineering from the Chinese University of Hong Kong, and B.Eng. in Electronic and Information Engineering from Harbin Institute of Technology. I was a postdoctoral fellow at Harvard University and then a senior research scientist at NVIDIA before joining Google. I served as an area chair for CVPR/ECCV/BMVC, and co-organized several workshops/tutorials at CVPR/ECCV/SIGGRAPH. I am a recipient of the PAMI Young Researcher award in 2020, the Longuet-Higgins prize at CVPR 2020, the best paper honorable mention award at CVPR 2018, and the first prize in the robust optical flow competition at CVPR 2018 and ECCV 2020.
Research Areas
Authored Publications
Sort By
FILM: Frame Interpolation for Large Motion
Fitsum Reda
Eric Tabellion
Proceedings of the European conference on computer vision (ECCV) (2022)
Preview abstract
We present a frame interpolation algorithm that synthesizes
an engaging slow-motion video from near-duplicate photos which often
exhibit large scene motion. Near-duplicates interpolation is an interesting
new application, but large motion poses challenges to existing methods.
To address this issue, we adapt a feature extractor that shares weights
across the scales, and present a “scale-agnostic” motion estimator. It
relies on the intuition that large motion at finer scales should be similar
to small motion at coarser scales, which boosts the number of available
pixels for large motion supervision. To inpaint wide disocclusions caused
by large motion and synthesize crisp frames, we propose to optimize
our network with the Gram matrix loss that measures the correlation
difference between features. To simplify the training process, we further
propose a unified single-network approach that removes the reliance on
additional optical-flow or depth network and is trainable from frame
triplets alone. Our approach outperforms state-of-the-art methods on
the Xiph large motion benchmark while performing favorably on Vimeo90K, Middlebury and UCF101. Source codes and pre-trained models are
available at https://film-net.github.io.
View details
Kubric: A scalable dataset generator
Anissa Yuenming Mak
Austin Stone
Carl Doersch
Cengiz Oztireli
Charles Herrmann
Daniel Rebain
Derek Nowrouzezahrai
Dmitry Lagun
Fangcheng Zhong
Florian Golemo
Francois Belletti
Henning Meyer
Hsueh-Ti (Derek) Liu
Issam Laradji
Klaus Greff
Kwang Moo Yi
Matan Sela
Noha Radwan
Thomas Kipf
Tianhao Wu
Vincent Sitzmann
Yilun Du
Yishu Miao
(2022)
Preview abstract
Data is the driving force of machine learning. The amount and quality of training data is often more important for the performance of a system than the details of its architecture. Data is also an important tool for testing specific hypothesis, and for empirically evaluating the behaviour of complex systems. Synthetic data generation represents a powerful tool that can address all these shortcomings: 1) it is cheap 2) supports rich ground-truth annotations 3) offers full control over data and 4) can circumvent privacy and legal concerns. Unfortunately the toolchain for generating data is less well developed than that for building models. We aim to improve this situation by introducing Kubric: a scalable open-source pipeline for generating realistic image and video data with rich ground truth annotations.
We also publish a collection of generated datasets and baseline results on several vision tasks.
View details
HumanGPS: Geodesic PreServing Feature for Dense Human Correspondence
Danhang "Danny" Tang
Mingsong Dou
Kaiwen Guo
Cem Keskin
Sofien Bouaziz
Ping Tan
Computer Vision and Pattern Recognition 2021 (2021), pp. 8
Preview abstract
In this paper, we address the problem of building dense correspondences between human images under arbitrary camera viewpoints and body poses. Prior art either assumes small motion between frames or relies on local descriptors, which cannot handle large motion or visually ambiguous body parts, e.g. left v.s. right hand. In contrast, we propose a deep learning framework that maps each pixel to a feature space, where the feature distances reflect the geodesic distances among pixels as if they were projected onto the surface of a 3D human scan. To this end, we introduce novel loss functions to push features apart according to their geodesic distances on the surface. Without any semantic annotation, the proposed embeddings automatically learn to differentiate visually similar parts and align different subjects into an unified feature space. Extensive experiments show that the learned embeddings can produce accurate correspondences between images with remarkable generalization capabilities on both intra and inter subjects.
View details
AutoFlow: Learning a Better Training Set for Optical Flow
Daniel Vlasic
Charles Herrmann
Varun Jampani
Michael Krainin
Huiwen Chang
Ramin Zabih
Ce Liu
(2021)
Preview abstract
Synthetic datasets play a critical role in pre-training CNN models for optical flow, but they are painstaking to generate and hard to adapt to new applications. To automate the process, we present AutoFlow, a simple and effective method to render training data for optical flow that optimizes the performance of a model on a target dataset. AutoFlow takes a layered approach to render synthetic data, where the motion, shape, and appearance of each layer are controlled by learnable hyperparameters. Experimental results show that AutoFlow achieves state-of-the-art accuracy in pre-training both PWC-Net and RAFT. Our code and data are available at https://autoflow-google.github.io.
View details
Learnable Cost Volume Using the Cayley Representation
Taihong Xiao
Jinwei Yuan
Xin-Yu Zhang
Kehan Xu
The European Conference on Computer Vision (ECCV) (2020)
Preview abstract
Cost volume is an essential component of recent deep models for optical flow estimation and is usually constructed by calculating the inner product between two feature vectors. However, the standard inner product in the commonly-used cost volume may limit the representation capacity of flow models because it neglects the correlation among different channel dimensions and weighs each dimension equally. To address this issue, we propose a learnable cost volume (LCV) using an elliptical inner product, which generalizes the standard inner product by a positive definite kernel matrix. To guarantee its positive definiteness, we perform spectral decomposition on the kernel matrix and re-parameterize it via the Cayley representation. The proposed LCV is a lightweight module and can be easily plugged into existing models to replace the vanilla cost volume. Experimental results show that the LCV module not only improves the accuracy of state-of-the-art models on standard benchmarks, but also promotes their robustness against illumination change, noises, and adversarial perturbations of the input signals.
View details