Janne Kontkanen
Janne Kontkanen works on machine learning, computer vision and graphics algorithms in Google’s augmented perception team. Previously he worked on Google Earth rendering technology and in the film industry. He has a PhD in computer graphics from the Aalto University (former Helsinki University of Technology) and received an academy award for his deep compositing work in DreamWorks Animation.
Research Areas
Authored Publications
Sort By
Generative Powers of Ten
Xiaojuan Wang
Steve Seitz
Ben Mildenhall
Pratul Srinivasan
Dor Verbin
Aleksander Hołyński
Preview abstract
We present a method that uses a text-to-image model to generate consistent content across multiple image scales, enabling extreme semantic zooms into a scene, e.g., ranging from a wide-angle landscape view of a forest to a macro shot of an insect sitting on one of the tree branches. This representation allows us to render continuously zooming videos, or explore different scales of the scene interactively. We achieve this through a joint multi-scale diffusion sampling approach that encourages consistency across different scales while preserving the integrity of each individual sampling process. Since each generated scale is guided by a different text prompt, our method enables deeper levels of zoom than traditional super-resolution methods that may struggle to create new contextual structure at vastly different scales. We compare our method qualitatively with alternative techniques in image super-resolution and outpainting, and show that our method is most effective at generating consistent multi-scale content.
View details
FILM: Frame Interpolation for Large Motion
Fitsum Reda
Eric Tabellion
Proceedings of the European conference on computer vision (ECCV) (2022)
Preview abstract
We present a frame interpolation algorithm that synthesizes
an engaging slow-motion video from near-duplicate photos which often
exhibit large scene motion. Near-duplicates interpolation is an interesting
new application, but large motion poses challenges to existing methods.
To address this issue, we adapt a feature extractor that shares weights
across the scales, and present a “scale-agnostic” motion estimator. It
relies on the intuition that large motion at finer scales should be similar
to small motion at coarser scales, which boosts the number of available
pixels for large motion supervision. To inpaint wide disocclusions caused
by large motion and synthesize crisp frames, we propose to optimize
our network with the Gram matrix loss that measures the correlation
difference between features. To simplify the training process, we further
propose a unified single-network approach that removes the reliance on
additional optical-flow or depth network and is trainable from frame
triplets alone. Our approach outperforms state-of-the-art methods on
the Xiph large motion benchmark while performing favorably on Vimeo90K, Middlebury and UCF101. Source codes and pre-trained models are
available at https://film-net.github.io.
View details
3D Moments from Near Duplicate Photos
Qianqian Wang
Zhengqi Li
Conference on Computer Vision and Pattern Recognition (CVPR) (2022)
Preview abstract
We introduce a new computational photography effect, starting from a pair of near duplicate photos that are prevalent in people's photostreams. Combining monocular depth synthesis and optical flow, we build a novel end-to-end system that can interpolate scene motion while simultaneously allowing independent control of the camera. We use our system to create short videos with scene motion and cinematic camera motion. We compare our method against two different baselines and demonstrate that our system outperforms them both qualitatively and quantitatively in publicly available benchmark datasets.
View details
Removing an object and its shadows from a photograph
Edward Zhang
Ricardo Martin-Brualla
IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2021)
Preview abstract
Removing objects from images is an important problem for many applications, such as in mixed reality. For believable removals, the shadows that the object casts onto the scene should also be removed. Current inpainting-based methods for object removal do not consider shadows, or at best require manually adding shadow regions to the inpainting mask. We introduce a deep learning pipeline for removing a shadow along with its caster. We leverage rough scene models in order to remove a wide variety of shadows (hard or soft, dark or subtle, large or thin) from planar surfaces with a wide variety of surface textures. We train our pipeline on synthetically rendered data, and show qualitative and quantitative results on both synthetic and real scenes.
View details
Multi-view Image Fusion
Marc Comino Trinidad
Ricardo Martin Brualla
Florian Kainz
International Conference on Computer Vision (ICCV) (2019) (to appear)
Preview abstract
We present an end-to-end learned system for fusing multiple misaligned photographs of the same scene into a chosen target view. We demonstrate three use cases: 1) color transfer for inferring color for a monochrome view, 2) HDR fusion for merging misaligned bracketed exposures, and 3) detail transfer for reprojecting a high definition image to the point of view of an affordable VR180-camera. While the system can be trained end-to-end, it consists of three distinct steps: feature extraction, image warping and fusion. We present a novel cascaded feature extraction method that enables us to synergetically learn optical flow at different resolution levels. We show that this significantly improves the network’s ability to learn large disparities. Finally, we demonstrate that our alignment architecture outperforms a state-of-the art optical flow network on the image warping task when both systems are trained in an identical manner.
View details
Jump: Virtual Reality Video
Robert Anderson
Carlos Hernandez Esteban
Steven M. Seitz
SIGGRAPH Asia (2016)
Preview abstract
We present Jump, a practical system for capturing high resolution, omnidirectional stereo (ODS) video suitable for wide scale consumption in currently available virtual reality (VR) headsets. Our system consists of a video camera built using off-the-shelf components and a fully automatic stitching pipeline capable of capturing video content in the ODS format. We have discovered and analyzed the distortions inherent to ODS when used for VR display as well as those introduced by our capture method and show that they are small enough to make this approach suitable for capturing a wide variety of scenes. Our stitching algorithm produces robust results by reducing the problem to one of pairwise image interpolation followed by compositing. We introduce novel optical flow and compositing methods designed specifically for this task. Our algorithm is temporally coherent and efficient, is currently running at scale on a distributed computing platform, and is capable of processing hours of footage each day.
View details