Mohammed Suhail
I am a Research Scientist with Google Research in Toronto. My research interest are in Neural Rendering and Generative Modelling. Please visit my website for more details and complete list of publications.
Research Areas
Authored Publications
Sort By
Associating Objects and their Effects in Unconstrained Monocular Video
Erika Lu
Zhengqi Li
Leonid Sigal
IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 2023
Preview abstract
We propose a method to decompose a video into a back-
ground and a set of foreground layers, where the back-
ground captures stationary elements while the foreground
layers capture moving objects along with their associated
effects (e.g. shadows and reflections). Our approach is de-
signed for unconstrained monocular videos, with arbitrary
camera and object motion. Prior work that tackles this
problem assumes that the video can be mapped onto a fixed
2D canvas, severely limiting the possible space of camera
motion. Instead, our method applies recent progress in
monocular camera pose and depth estimation to create a
full, RGBD video layer for the background, along with a
video layer for each foreground object. To solve the under-
constrained decomposition problem, we propose a new loss
formulation based on multi-view consistency. We test our
method on challenging videos with complex camera motion
and show significant qualitative improvement over current
methods.
View details
Generalizable Patch-Based Neural Rendering
Leonid Sigal
European Conference on Computer Vision (2022) (to appear)
Preview abstract
Neural rendering has received tremendous attention since the advent of
Neural Radiance Fields (NeRF), and has pushed the state-of-the-art on
novel-view synthesis considerably. The recent focus has been on models
that overfit to a single scene, and the few attempts to learn models
that can synthesize novel views of unseen scenes mostly consist of
combining deep convolutional features with a NeRF-like model. We
propose a different paradigm, where no deep visual features and no
NeRF-like volume rendering are needed. Our method is capable of
predicting the color of a target ray in a novel scene directly, just
from a collection of patches sampled from the scene. We first leverage
epipolar geometry to extract patches along the epipolar lines of each
reference view. Each patch is linearly projected into a 1D feature
vector and a sequence of transformers process the collection. For
positional encoding, we parameterize rays as in a light field
representation, with the crucial difference that the coordinates are
canonicalized with respect to the target ray, which makes our method
independent of the reference frame and improves generalization. We
show that our approach outperforms the state-of-the-art on novel view
synthesis of unseen scenes even when being trained with considerably
less data than prior work. Our code is available at
https://mohammedsuhail.net/gen_patch_neural_rendering.
View details
Light Field Neural Rendering
Leonid Sigal
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2022)
Preview abstract
Classical light field rendering for novel view synthesis can
accurately reproduce view-dependent effects such as reflection,
refraction, and translucency, but requires a dense view sampling of
the scene. Methods based on geometric reconstruction need only sparse
views, but cannot accurately model non-Lambertian effects. We
introduce a model that combines the strengths and mitigates the
limitations of these two directions. By operating on a
four-dimensional representation of the light field, our model learns
to represent view-dependent effects accurately. By enforcing geometric
constraints during training and inference, the scene geometry is
implicitly learned from a sparse set of views. Concretely, we
introduce a two-stage transformer-based model that first aggregates
features along epipolar lines, then aggregates features along
reference views to produce the color of a target ray. Our model
outperforms the state-of-the-art on multiple forward-facing and 360◦
datasets, with larger margins on scenes with severe view-dependent
variations. Code and results can be found at light-field-neural-
rendering.github.io.
View details