Google at CVPR 2018

June 18, 2018


This week, Salt Lake City hosts the 2018 Conference on Computer Vision and Pattern Recognition (CVPR 2018), the premier annual computer vision event comprising the main conference and several co-located workshops and tutorials. As a leader in computer vision research and a Diamond Sponsor, Google will have a strong presence at CVPR 2018 — over 200 Googlers will be in attendance to present papers and invited talks at the conference, and to organize and participate in multiple workshops.

If you are attending CVPR this year, please stop by our booth and chat with our researchers who are actively pursuing the next generation of intelligent systems that utilize the latest machine learning techniques applied to various areas of machine perception. Our researchers will also be available to talk about and demo several recent efforts, including the technology behind portrait mode on the Pixel 2 and Pixel 2 XL smartphones, the Open Images V4 dataset and much more.

You can learn more about our research being presented at CVPR 2018 in the list below (Googlers highlighted in blue)

Organization
Finance Chair: Ramin Zabih
Area Chairs include: Sameer Agarwal, Aseem Agrawala, Jon Barron, Abhinav Shrivastava, Carl Vondrick, Ming-Hsuan Yang

Orals/Spotlights
Unsupervised Discovery of Object Landmarks as Structural Representations
Yuting Zhang, Yijie Guo, Yixin Jin, Yijun Luo, Zhiyuan He, Honglak Lee

DoubleFusion: Real-time Capture of Human Performances with Inner Body Shapes from a Single Depth Sensor
Tao Yu, Zerong Zheng, Kaiwen Guo, Jianhui Zhao, Qionghai Dai, Hao Li, Gerard Pons-Moll, Yebin Liu

Neural Kinematic Networks for Unsupervised Motion Retargetting
Ruben Villegas, Jimei Yang, Duygu Ceylan, Honglak Lee

Burst Denoising with Kernel Prediction Networks
Ben Mildenhall, Jiawen Chen, Jonathan Barron, Robert Carroll, Dillon Sharlet, Ren Ng

Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference Benoit Jacob, Skirmantas Kligys, Bo Chen, Matthew Tang, Menglong Zhu, Andrew Howard, Dmitry Kalenichenko, Hartwig Adam

AVA: A Video Dataset of Spatio-temporally Localized Atomic Visual Actions
Chunhui Gu, Chen Sun, David Ross, Carl Vondrick, Caroline Pantofaru, Yeqing Li, Sudheendra Vijayanarasimhan, George Toderici, Susanna Ricco, Rahul Sukthankar, Cordelia Schmid, Jitendra Malik

Focal Visual-Text Attention for Visual Question Answering
Junwei Liang, Lu Jiang, Liangliang Cao, Li-Jia Li, Alexander G. Hauptmann

Inferring Light Fields from Shadows
Manel Baradad, Vickie Ye, Adam Yedida, Fredo Durand, William Freeman, Gregory Wornell, Antonio Torralba

Modifying Non-Local Variations Across Multiple Views
Tal Tlusty, Tomer Michaeli, Tali Dekel, Lihi Zelnik-Manor

Iterative Visual Reasoning Beyond Convolutions
Xinlei Chen, Li-jia Li, Fei-Fei Li, Abhinav Gupta

Unsupervised Training for 3D Morphable Model Regression
Kyle Genova, Forrester Cole, Aaron Maschinot, Daniel Vlasic, Aaron Sarna, William Freeman

Learning Transferable Architectures for Scalable Image Recognition
Barret Zoph, Vijay Vasudevan, Jonathon Shlens, Quoc Le

The iNaturalist Species Classification and Detection Dataset
Grant van Horn, Oisin Mac Aodha, Yang Song, Yin Cui, Chen Sun, Alex Shepard, Hartwig Adam, Pietro Perona, Serge Belongie

Learning Intrinsic Image Decomposition from Watching the World
Zhengqi Li, Noah Snavely

Learning Intelligent Dialogs for Bounding Box Annotation
Ksenia Konyushkova, Jasper Uijlings, Christoph Lampert, Vittorio Ferrari

Posters
Revisiting Knowledge Transfer for Training Object Class Detectors
Jasper Uijlings, Stefan Popov, Vittorio Ferrari

Rethinking the Faster R-CNN Architecture for Temporal Action Localization
Yu-Wei Chao, Sudheendra Vijayanarasimhan, Bryan Seybold, David Ross, Jia Deng, Rahul Sukthankar

Hierarchical Novelty Detection for Visual Object Recognition
Kibok Lee, Kimin Lee, Kyle Min, Yuting Zhang, Jinwoo Shin, Honglak Lee

COCO-Stuff: Thing and Stuff Classes in Context
Holger Caesar, Jasper Uijlings, Vittorio Ferrari

Appearance-and-Relation Networks for Video Classification
Limin Wang, Wei Li, Wen Li, Luc Van Gool

MorphNet: Fast & Simple Resource-Constrained Structure Learning of Deep Networks
Ariel Gordon, Elad Eban, Bo Chen, Ofir Nachum, Tien-Ju Yang, Edward Choi

Deformable Shape Completion with Graph Convolutional Autoencoders
Or Litany, Alex Bronstein, Michael Bronstein, Ameesh Makadia

MegaDepth: Learning Single-View Depth Prediction from Internet Photos
Zhengqi Li, Noah Snavely

Unsupervised Discovery of Object Landmarks as Structural Representations
Yuting Zhang, Yijie Guo, Yixin Jin, Yijun Luo, Zhiyuan He, Honglak Lee

Burst Denoising with Kernel Prediction Networks
Ben Mildenhall, Jiawen Chen, Jonathan Barron, Robert Carroll, Dillon Sharlet, Ren Ng

Pix3D: Dataset and Methods for Single-Image 3D Shape Modeling
Xingyuan Sun, Jiajun Wu, Xiuming Zhang, Zhoutong Zhang, Tianfan Xue, Joshua Tenenbaum, William Freeman

Sparse, Smart Contours to Represent and Edit Images
Tali Dekel, Dilip Krishnan, Chuang Gan, Ce Liu, William Freeman

MaskLab: Instance Segmentation by Refining Object Detection with Semantic and Direction Features
Liang-Chieh Chen, Alexander Hermans, George Papandreou, Florian Schroff, Peng Wang, Hartwig Adam

Large Scale Fine-Grained Categorization and Domain-Specific Transfer Learning
Yin Cui, Yang Song, Chen Sun, Andrew Howard, Serge Belongie

Improved Lossy Image Compression with Priming and Spatially Adaptive Bit Rates for Recurrent Networks
Nick Johnston, Damien Vincent, David Minnen, Michele Covell, Saurabh Singh, Sung Jin Hwang, George Toderici, Troy Chinen, Joel Shor

MobileNetV2: Inverted Residuals and Linear Bottlenecks
Mark Sandler, Andrew Howard, Menglong Zhu, Andrey Zhmoginov, Liang-Chieh Chen

ScanComplete: Large-Scale Scene Completion and Semantic Segmentation for 3D Scans
Angela Dai, Daniel Ritchie, Martin Bokeloh, Scott Reed, Juergen Sturm, Matthias Nießner

Sim2Real View Invariant Visual Servoing by Recurrent Control
Fereshteh Sadeghi, Alexander Toshev, Eric Jang, Sergey Levine

Alternating-Stereo VINS: Observability Analysis and Performance Evaluation
Mrinal Kanti Paul, Stergios Roumeliotis

Soccer on Your Tabletop
Konstantinos Rematas, Ira Kemelmacher, Brian Curless, Steve Seitz

Unsupervised Learning of Depth and Ego-Motion from Monocular Video Using 3D Geometric Constraints
Reza Mahjourian, Martin Wicke, Anelia Angelova

AVA: A Video Dataset of Spatio-temporally Localized Atomic Visual Actions
Chunhui Gu, Chen Sun, David Ross, Carl Vondrick, Caroline Pantofaru, Yeqing Li, Sudheendra Vijayanarasimhan, George Toderici, Susanna Ricco, Rahul Sukthankar, Cordelia Schmid, Jitendra Malik

Inferring Light Fields from Shadows
Manel Baradad, Vickie Ye, Adam Yedida, Fredo Durand, William Freeman, Gregory Wornell, Antonio Torralba

Modifying Non-Local Variations Across Multiple Views
Tal Tlusty, Tomer Michaeli, Tali Dekel, Lihi Zelnik-Manor

Aperture Supervision for Monocular Depth Estimation
Pratul Srinivasan, Rahul Garg, Neal Wadhwa, Ren Ng, Jonathan Barron

Instance Embedding Transfer to Unsupervised Video Object Segmentation
Siyang Li, Bryan Seybold, Alexey Vorobyov, Alireza Fathi, Qin Huang, C.-C. Jay Kuo

Frame-Recurrent Video Super-Resolution
Mehdi S. M. Sajjadi, Raviteja Vemulapalli, Matthew Brown

Weakly Supervised Action Localization by Sparse Temporal Pooling Network
Phuc Nguyen, Ting LiuGautam Prasad, Bohyung Han

Iterative Visual Reasoning Beyond Convolutions
Xinlei Chen, Li-jia Li, Fei-Fei Li, Abhinav Gupta

Learning and Using the Arrow of Time
Donglai Wei, Andrew Zisserman, William Freeman, Joseph Lim

HydraNets: Specialized Dynamic Architectures for Efficient Inference
Ravi Teja Mullapudi, Noam Shazeer, William Mark, Kayvon Fatahalian

Thoracic Disease Identification and Localization with Limited Supervision
Zhe Li, Chong Wang, Mei Han, Yuan Xue, Wei WeiLi-jia Li, Fei-Fei Li

Inferring Semantic Layout for Hierarchical Text-to-Image Synthesis
Seunghoon Hong, Dingdong Yang, Jongwook Choi, Honglak Lee

Deep Semantic Face Deblurring
Ziyi Shen, Wei-Sheng Lai, Tingfa Xu, Jan Kautz, Ming-Hsuan Yang

Unsupervised Training for 3D Morphable Model Regression
Kyle Genova, Forrester Cole, Aaron Maschinot, Daniel Vlasic, Aaron Sarna, William Freeman

Learning Transferable Architectures for Scalable Image Recognition
Barret Zoph, Vijay Vasudevan, Jonathon Shlens, Quoc Le

Learning Intrinsic Image Decomposition from Watching the World
Zhengqi Li, Noah Snavely

PiCANet: Learning Pixel-wise Contextual Attention for Saliency Detection
Nian Liu, Junwei Han, Ming-Hsuan Yang

Mobile Video Object Detection with Temporally-Aware Feature Maps
Mason Liu, Menglong Zhu

Tutorials
Computer Vision for Robotics and Driving
Anelia Angelova, Sanja Fidler

Unsupervised Visual Learning
Pierre Sermanet, Anelia Angelova

UltraFast 3D Sensing, Reconstruction and Understanding of People, Objects and Environments
Sean Fanello, Julien Valentin, Jonathan Taylor, Christoph Rhemann, Adarsh Kowdle, Jürgen Sturm, Christine Kaeser-Chen, Pavel Pidlypenskyi, Rohit Pandey, Andrea Tagliasacchi, Sameh Khamis, David KimMingsong Dou, Kaiwen Guo, Danhang Tang, Shahram Izadi

Generative Adversarial Networks
Jun-Yan Zhu, Taesung Park, Mihaela Rosca, Phillip Isola, Ian Goodfellow