Janne Kontkanen

Janne Kontkanen

Janne Kontkanen works on machine learning, computer vision and graphics algorithms in Google’s augmented perception team. Previously he worked on Google Earth rendering technology and in the film industry. He has a PhD in computer graphics from the Aalto University (former Helsinki University of Technology) and received an academy award for his deep compositing work in DreamWorks Animation.

Research Areas

Authored Publications
Sort By
  • Title
  • Title, descending
  • Year
  • Year, descending
    Generative Powers of Ten
    Xiaojuan Wang
    Steve Seitz
    Ben Mildenhall
    Pratul Srinivasan
    Dor Verbin
    Aleksander Hołyński
    Preview abstract We present a method that uses a text-to-image model to generate consistent content across multiple image scales, enabling extreme semantic zooms into a scene, e.g., ranging from a wide-angle landscape view of a forest to a macro shot of an insect sitting on one of the tree branches. This representation allows us to render continuously zooming videos, or explore different scales of the scene interactively. We achieve this through a joint multi-scale diffusion sampling approach that encourages consistency across different scales while preserving the integrity of each individual sampling process. Since each generated scale is guided by a different text prompt, our method enables deeper levels of zoom than traditional super-resolution methods that may struggle to create new contextual structure at vastly different scales. We compare our method qualitatively with alternative techniques in image super-resolution and outpainting, and show that our method is most effective at generating consistent multi-scale content. View details
    FILM: Frame Interpolation for Large Motion
    Fitsum Reda
    Eric Tabellion
    Proceedings of the European conference on computer vision (ECCV) (2022)
    Preview abstract We present a frame interpolation algorithm that synthesizes an engaging slow-motion video from near-duplicate photos which often exhibit large scene motion. Near-duplicates interpolation is an interesting new application, but large motion poses challenges to existing methods. To address this issue, we adapt a feature extractor that shares weights across the scales, and present a “scale-agnostic” motion estimator. It relies on the intuition that large motion at finer scales should be similar to small motion at coarser scales, which boosts the number of available pixels for large motion supervision. To inpaint wide disocclusions caused by large motion and synthesize crisp frames, we propose to optimize our network with the Gram matrix loss that measures the correlation difference between features. To simplify the training process, we further propose a unified single-network approach that removes the reliance on additional optical-flow or depth network and is trainable from frame triplets alone. Our approach outperforms state-of-the-art methods on the Xiph large motion benchmark while performing favorably on Vimeo90K, Middlebury and UCF101. Source codes and pre-trained models are available at https://film-net.github.io. View details
    3D Moments from Near Duplicate Photos
    Qianqian Wang
    Zhengqi Li
    Conference on Computer Vision and Pattern Recognition (CVPR) (2022)
    Preview abstract We introduce a new computational photography effect, starting from a pair of near duplicate photos that are prevalent in people's photostreams. Combining monocular depth synthesis and optical flow, we build a novel end-to-end system that can interpolate scene motion while simultaneously allowing independent control of the camera. We use our system to create short videos with scene motion and cinematic camera motion. We compare our method against two different baselines and demonstrate that our system outperforms them both qualitatively and quantitatively in publicly available benchmark datasets. View details
    Removing an object and its shadows from a photograph
    Edward Zhang
    Ricardo Martin-Brualla
    IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2021)
    Preview abstract Removing objects from images is an important problem for many applications, such as in mixed reality. For believable removals, the shadows that the object casts onto the scene should also be removed. Current inpainting-based methods for object removal do not consider shadows, or at best require manually adding shadow regions to the inpainting mask. We introduce a deep learning pipeline for removing a shadow along with its caster. We leverage rough scene models in order to remove a wide variety of shadows (hard or soft, dark or subtle, large or thin) from planar surfaces with a wide variety of surface textures. We train our pipeline on synthetically rendered data, and show qualitative and quantitative results on both synthetic and real scenes. View details
    Multi-view Image Fusion
    Marc Comino Trinidad
    Ricardo Martin Brualla
    Florian Kainz
    International Conference on Computer Vision (ICCV) (2019) (to appear)
    Preview abstract We present an end-to-end learned system for fusing multiple misaligned photographs of the same scene into a chosen target view. We demonstrate three use cases: 1) color transfer for inferring color for a monochrome view, 2) HDR fusion for merging misaligned bracketed exposures, and 3) detail transfer for reprojecting a high definition image to the point of view of an affordable VR180-camera. While the system can be trained end-to-end, it consists of three distinct steps: feature extraction, image warping and fusion. We present a novel cascaded feature extraction method that enables us to synergetically learn optical flow at different resolution levels. We show that this significantly improves the network’s ability to learn large disparities. Finally, we demonstrate that our alignment architecture outperforms a state-of-the art optical flow network on the image warping task when both systems are trained in an identical manner. View details
    Preview abstract We present Jump, a practical system for capturing high resolution, omnidirectional stereo (ODS) video suitable for wide scale consumption in currently available virtual reality (VR) headsets. Our system consists of a video camera built using off-the-shelf components and a fully automatic stitching pipeline capable of capturing video content in the ODS format. We have discovered and analyzed the distortions inherent to ODS when used for VR display as well as those introduced by our capture method and show that they are small enough to make this approach suitable for capturing a wide variety of scenes. Our stitching algorithm produces robust results by reducing the problem to one of pairwise image interpolation followed by compositing. We introduce novel optical flow and compositing methods designed specifically for this task. Our algorithm is temporally coherent and efficient, is currently running at scale on a distributed computing platform, and is capable of processing hours of footage each day. View details