Google at CVPR 2024
Google at CVPR 2024
The 2024 meeting of the Computer Vision and Pattern Recognition conference (CVPR 2024) is being held Monday, June 17th through Friday, June 21st in Seattle, Washington (with additional virtual content). As a leader in machine perception research and a Platinum Sponsor, Google will have a strong presence with 95+ papers being presented at the main conference and active involvement in 70+ workshops and tutorials.
Attending CVPR 2024 in person? Be sure to drop by the Google booth to learn more about how we’re actively exploring the latest techniques for application to various areas of machine perception. Visit the @GoogleAI X (formerly Twitter) and Google Research LinkedIn accounts to find out about Google booth activities (e.g., demos and Q&A sessions, which are also listed below).
Continue below to learn more about how Google researchers are engaged at CVPR 2024 (Google affiliations highlighted in bold). Also, learn about the two Google papers that received the CVPR Best Paper Award.
All session times are provided in PDT.
Quick links
Quick links
Board & Organizing Committee
-
Ramin Zabih
- General Chair
-
Ira Kemelmacher-Shlizerman
- Local Chair
-
Ranjay Krishna
- Local Chair
-
Boqing Gong
- Publicity Chair & Area Chair
-
Abhishek Gupta
- Area Chair
-
Aishwarya Agrawal
- Area Chair
-
Alireza Fathi
- Area Chair
-
Ameesh Makadia
- Area Chair
-
Andre Araujo
- Area Chair
-
Andrea Tagliasacchi
- Area Chair
-
Anurag Arnab
- Area Chair
-
Arsha Nagrani
- Area Chair
-
Bo Chen
- Area Chair
-
Boyi Li
- Area Chair
-
Carl Doersch
- Area Chair
-
Chen Sun
- Area Chair
-
Chen Wang
- Area Chair
-
Dahun Kim
- Area Chair
-
Dima Damen
- Area Chair
-
Dong Chen
- Area Chair
-
Evan Shelhamer
- Area Chair
-
Federico Tombari
- Area Chair
-
Hao Chen
- Area Chair
-
Jasper Uijlings
- Area Chair
-
Jian Wang
- Area Chair
-
Jordi Pont-Tuset
- Area Chair
-
Matthew Brown
- Area Chair
-
Negar Rostamzadeh
- Area Chair
-
Noah Snavely
- Area Chair
-
Oliver Wang
- Area Chair
-
Peng Wang
- Area Chair
-
Peter Hedman
- Area Chair
-
Peyman Milanfar
- Area Chair
-
Ranjay Krishna
- Area Chair
-
Richard Zhang
- Area Chair
-
Saurabh Singh
- Area Chair
-
Subhashini Venugopalan
- Area Chair
-
Thomas Kipf
- Area Chair
-
Todd Zickler
- Area Chair
-
Xiaoming Liu
- Area Chair
-
Yi-Ting Chen
- Area Chair
-
Yin Li
- Area Chair
-
Yu-Chuan Su
- Area Chair
-
Zhengqi Li
- Area Chair
-
Zongwei Zhou
- Area Chair
Award recipients
-
CVPR Best Paper Award
Award recipients: Zhengqi Li, Richard Tucker, Noah Snavely, Aleksander Holynski
for "Generative Image Dynamics"
-
CVPR Best Paper Award
Award recipients: Youwei Liang*, Junfeng He, Gang Li, Peizhao Li*, Arseniy Klimovskiy, Nicholas Carolan, Jiao Sun, Jordi Pont-Tuset, Sarah Young, Feng Yang, Junjie Ke, Krishnamurthy Dj Dvijotham, Katherine M. Collins*, Yiwen Luo, Yang Li, Kai J Kohlhoff, Deepak Ramachandran, Vidhya Navalpakkam
for "Rich Human Feedback for Text-to-Image Generation"
Accepted papers
DiffusionLight: Light Probes for Free by Painting a Chrome Ball
Pakkapon Phongthawee, Worameth Chinchuthakun, Nontaphat Sinsunthithet, Varun Jampani, Amit Raj, Pramook Khungurn, Supasorn Suwajanakorn
Eclipse: Disambiguating Illumination and Materials Using Unintended Shadows
Dor Verbin, Ben Mildenhall, Peter Hedman, Jonathan T. Barron, Todd Zickler, Pratul P. Srinivasan
Instruct-Imagen: Image Generation with Multi-Modal Instruction
Hexiang Hu, Kelvin C.K. Chan, Yu-Chuan Su, Wenhu Chen, Yandong Li, Kihyuk Sohn, Yang Zhao, Xue Ben, Boqing Gong, William Cohen, Ming-Wei Chang, Xuhui Jia
Style Aligned Image Generation via Shared Attention
Amir Hertz, Andrey Voynov, Shlomi Fruchter, Daniel Cohen-Or
Visual Program Distillation: Distilling Tools and Programmatic Reasoning into Vision-Language Models
Yushi Hu*, Otilia Stretcu, Chun-Ta Lu, Krishnamurthy Viswanathan, Kenji Hata, Enming Luo, Ranjay Krishna, Ariel Fuxman
SceneFun3D: Fine-Grained Functionality and Affordance Understanding in 3D Scenes
Alexandros Delitzas, Ayça Takmaz, Federico Tombari, Robert Sumner, Marc Pollefeys, Francis Engelmann
Rich Human Feedback for Text-to-Image Generation
Youwei Liang*, Junfeng He, Gang Li, Peizhao Li*, Arseniy Klimovskiy, Nicholas Carolan, Jiao Sun, Jordi Pont-Tuset, Sarah Young, Feng Yang, Junjie Ke, Krishnamurthy Dj Dvijotham, Katherine M. Collins*, Yiwen Luo, Yang Li, Kai J Kohlhoff, Deepak Ramachandran, Vidhya Navalpakkam
Generative Image Dynamics
Zhengqi Li, Richard Tucker, Noah Snavely, Aleksander Holynski
Alchemist: Parametric Control of Material Properties with Diffusion Models
Prafull Sharma*, Varun Jampani, Yuanzhen Li, Dmitry Lagun, Fredo Durand, Bill Freeman, Mark Matthews
CAT-Seg: Cost Aggregation for Open-Vocabulary Semantic Segmentation
Seokju Cho, Heeseong Shin, Sunghwan Hong, Seungjun An, Seungjun Lee, Anurag Arnab, Paul Hongsuck Seo, Seungryong Kim
M&M VTO: Multi-Garment Virtual Try-On and Editing
Luyang Zhu*, Yingwei Li, Nan Liu, Hao Peng, Dawei Yang, Ira Kemelmacher-Shlizerman
Egocentric Whole-Body Motion Capture with FisheyeViT and Diffusion-Based Motion Refinement
Jian Wang, Zhe Cao, Diogo Luvizon, Lingjie Liu, Kripasindhu Sarkar, Danhang Tang, Thabo Beeler, Christian Theobalt
DiffHuman: Probabilistic Photorealistic 3D Reconstruction of Humans
Akash Sengupta*, Thiemo Alldieck, Nikos Kolotouros, Enric Corona, Andrei Zanfir, Cristian Sminchisescu
Neural Fields as Distributions: Signal Processing Beyond Euclidean Space
Daniel Rebain, Soroosh Yazdani, Kwang Moo Yi, Andrea Tagliasacchi
XFeat: Accelerated Features for Lightweight Image Matching
Guilherme Potje, Felipe Cadar, André Araujo, Renato Martins, Erickson Nascimento
Diffuse, Attend, and Segment: Unsupervised Zero-Shot Segmentation Using Stable Diffusion
Junjiao Tian, Lavisha Aggarwal, Andrea Colaco, Zsolt Kira, Mar Gonzalez-Franco
Telling Left from Right: Identifying Geometry-Aware Semantic Correspondence
Junyi Zhang, Charles Hermann, Junhwa Hur, Eric Chen, Varun Jampani, Deqing Sun, Ming-Hsuan Yang
MonoAvatar++: Efficient 3D Implicit Head Avatar with Mesh-Anchored Hash Table Blendshapes
Ziqian Bai*, Feitong Tan, Sean Fanello, Rohit Pandey, Mingsong Dou, Shichen Liu, Ping Tan, Yinda Zhang
One-Shot Open Affordance Learning with Foundation Models
Gen Li, Deqing Sun, Laura Sevilla-Lara, Varun Jampani
Optimizing Diffusion Noise Can Serve As Universal Motion Priors
Korrawe Karunratanakul, Konpat Preechakul, Emre Aksan, Thabo Beeler, Supasorn Suwajanakorn, Siyu Tang
NIFTY: Neural Object Interaction Fields for Guided Human Motion Synthesis
Nilesh Kulkarni, Davis Rempe, Kyle Genova, Abhijit Kundu, Justin Johnson, David Fouhey, Leonidas Guibas
UFOGen: You Forward Once Large Scale Text-to-Image Generation via Diffusion GANs
Yanwu Xu*, Yang Zhao, Zhisheng Xiao, Tingbo Hou
Time-, Memory-and Parameter-Efficient Visual Adaptation
Otniel-Bogdan Mercea*, Alexey Gritsenko, Cordelia Schmid, Anurag Arnab
Generative Powers of Ten
Xiaojuan Wang, Janne Kontkanen, Brian Curless, Steve Seitz, Ira Kemelmacher, Ben Mildenhall, Pratul Srinivasan, Dor Verbin, Aleksander Holynski
Readout Guidance: Learning Control from Diffusion Features
Grace Luo, Trevor Darrell, Oliver Wang, Dan B Goldman, Aleksander Holynski
VecFusion: Vector Font Generation with Diffusion
Vikas Thamizharasan, Difan Liu, Shantanu Agarwal, Matthew Fisher, Michael Gharbi, Oliver Wang, Alec Jacobson, Evangelos Kalogerakis
Rethinking FID: Towards a Better Evaluation Metric for Image Generation
Sadeep Jayasumana, Srikumar Ramalingam, Andreas Veit, Daniel Glasner, Ayan Chakrabarti, Sanjiv Kumar
Video Interpolation With Diffusion Models
Siddhant Jain, Daniel Watson, Eric Tabellion, Aleksander Hołyński, Ben Poole, Janne Kontkanen
CONFORM: Contrast Is All You Need for High-Fidelity Text-to-Image Diffusion Models
Tuna Han Salih Meral, Enis Simsar, Federico Tombari, Pinar Yanardag
4D-fy: Text-to-4D Generation Using Hybrid Score Distillation Sampling
Sherwin Bahmani, Ivan Skorokhodov, Victor Rong, Gordon Wetzstein, Leonisdas Guiba, Peter Wonka, Sergey Tulyakov, Jeong Joon Park, Andrea Tagliasacchi, David Lindell
Beyond First-Order Tweedie: Solving Inverse Problems Using Latent Diffusion
Litu Rout*, Yujia Chen, Abhishek Kumar, Constantine Caramanis, Sanjay Shakkottai, Wen-Sheng Chu
Single Mesh Diffusion Models with Field Latents for Texture Generation
Thomas W. Mitchel*, Carlos Esteves, Ameesh Makadia
Improving Subject-Driven Image Synthesis with Subject-Agnostic Guidance
Kelvin C.K. Chan, Yang Zhao, Xuhui Jia, Ming-Hsuan Yang, Huisheng Wang
UniGS: Unified Representation for Image Generation and Segmentation
Lu Qi, Lehan Yang, Weidong Guo, Yu Xu, Bo Du, Varun Jampani, Ming-Hsuan Yang
Text-Driven Image Editing via Learnable Regions
Yuanze Lin, Yi-Wen Chen, Lu Jiang, Yi-Hsuan Tsai, Ming-Hsuan Yan
GeneAvatar: Generic Expression-Aware Volumetric Head Avatar Editing from a Single Image
Chong Bao, Yinda Zhang, Yuan Li, Xiyu Zhang, Bangbang Yang, Hujun Bao, Marc Pollefeys, Guofeng Zhang, Zhaopeng Cui
MarkovGen: Structured Prediction for Efficient Text-to-Image Generation
Sadeep Jayasumana, Daniel Glasner, Srikumar Ramalingam, Andreas Veit, Ayan Chakrabarti, Sanjiv Kumar
CoDi: Conditional Diffusion Distillation for Higher-Fidelity and Faster Image Generation
Kangfu Mei, Mauricio Delbracio, Hossein Talebi, Zhengzhong Tu, Vishal M. Patel, Peyman Milanfar
WonderJourney: Going from Anywhere to Everywhere
Hong-Xing Yu, Haoyi Duan, Junhwa Hur, Kyle Sargent, Michael Rubinstein, William T. Freeman, Forrester Cole, Deqing Sun, Noah Snavely, Jiajun Wu, Charles Herrmann
HyperDreamBooth: HyperNetworks for Fast Personalization of Text-to-Image Models
Nataniel Ruiz, Yuanzhen Li, Varun Jampani, Wei Wei, Tingbo Hou, Yael Pritch, Neal Wadhwa, Michael Rubinstein, Kfir Aberman
Scaling Laws of Synthetic Images for Model Training ... for Now
Lijie Fan*, Kaifeng Chen, Dilip Krishnan, Dina Katabi, Phillip Isola, Yonglong Tian
ZeroNVS: Zero-Shot 360-Degree View Synthesis from a Single Image
Kyle Sargent, Zizhang Li, Tanmay Shah, Charles Herrmann, Hong-Xing Yu, Yunzhi Zhang, Eric Ryan Chan, Dmitry Lagun, Li Fei-Fei, Deqing Sun, Jiajun Wu
C3: High-Performance and Low-Complexity Neural Compression from a Single Image or Video
Hyunjik Kim, Matthias Bauer, Lucas Theis, Jonathan Richard Schwarz, Emilien Dupont
PRDP: Proximal Reward Difference Prediction for Large-Scale Reward Fine-Tuning of Diffusion Models
Fei Deng*, Qifei Wang, Wei Wei*, Tingbo Hou, Matthias Grundmann
Know Your Neighbors: Improving Single-View Reconstruction via Spatial Vision-Language Reasoning
Rui Li, Tobias Fischer, Mattia Segu, Marc Pollefeys, Luc Van Gool, Federico Tombari
SecondPose: SE(3)-Consistent Dual-Stream Feature Fusion for Category-Level Pose Estimation
Yamei Chen, Yan Di, Guangyao Zhai, Fabian Manhardt, Chenyangguang Zhang, Ruida Zhang, Federico Tombari, Nassir Navab, Benjamin Busam
MOHO: Learning Single-View Hand-Held Object Reconstruction with Multi-View Occlusion-Aware Supervision
Rui Li, Tobias Fischer, Mattia Segu, Marc Pollefeys, Luc Van Gool, Federico Tombari
Omni-SMoLA: Boosting Generalist Multimodal Models with Soft Mixture of Low-Rank Experts
Jialin Wu, Xia Hu, Yaqing Wang, Bo Pang, Radu Soricut
Pixel Aligned Language Models
Jiarui Xu*, Xingyi Zhou, Shen Yan, Xiuye Gu, Anurag Arnab, Chen Sun, Xiaolong Wang, Cordelia Schmid
Distilling Vision-Language Models on Millions of Videos
Yue Zhao*, Long Zhao, Xingyi Zhou, Jialin Wu, Chun-Te Chu, Hui Miao, Florian Schroff, Hartwig Adam, Ting Liu, Boqing Gong, Philipp Krähenbühl, Liangzhe Yuan
Generating Enhanced Negatives for Training Language-Based Object Detectors
Shiyu Zhao, Long Zhao, Vijay Kumar BG, Yumin Suh, Dimitris N. Metaxas, Manmohan Chandraker, Samuel Schulter
Taming Self-Training for Open-Vocabulary Object Detection
Shiyu Zhao, Samuel Schulter, Long Zhao, Zhixing Zhang, Vijay Kumar BG, Yumin Suh, Manmohan Chandraker, Dimitris N. Metaxas
A Simple Recipe for Contrastively Pre-training Video-First Encoders Beyond 16 Frames
Pinelopi Papalampidi, Skanda Koppula, Shreya Pathak, Justin Chiu, Joe Heyward, Viorica Patraucean, Jiajun Shen, Antoine Miech, Andrew Zisserman, Aida Nematzdeh
GLaMM: Pixel Grounding Large Multimodal Model
Hanoona Abdul Rasheed, Muhammad Maaz, Sahal Shaji Mullappilly, Abdelrahman M Shaker, Salman Khan, Hisham Cholakkal, Rao Muhammad Anwer, Eric P. Xing, Ming-Hsuan Yang, Fahad Khan
PaLI-X: On Scaling Up a Multilingual Vision and Language Model
Xi Chen, Josip Djolonga, Piotr Padlewski, Basil Mustafa, Soravit Changpinyo, Jialin Wu, Carlos Riquelme Ruiz, Sebastian Goodman, Xiao Wang, Yi Tay, Siamak Shakeri, Mostafa Dehghani, Daniel Salz, Mario Lucic, Michael Tschannen, Arsha Nagrani, Hexiang Hu, Mandar Joshi, Bo Pang, Ceslee Montgomery, Paulina Pietrzyk, Marvin Ritter, AJ Piergiovanni, Matthias Minderer, Filip Pavetic, Austin Waters, Gang Li, Ibrahim Alabdulmohsin, Lucas Beyer, Julien Amelot, Kenton Lee, Andreas Peter Steiner, Yang Li, Daniel Keysers, Anurag Arnab, Yuanzhong Xu, Keran Rong, Alexander Kolesnikov, Mojtaba Seyedhosseini, Anelia Angelova, Xiaohua Zhai, Neil Houlsby, Radu Soricut
SpatialVLM: Endowing Vision-Language Models with Spatial Reasoning Capabilities
Boyuan Chen*, Zhuo Xu, Sean Kirmani, Brian Ichter, Danny Driess, Pete Florence, Dorsa Sadigh, Leonidas Guibas, Fei Xia
WateRF: Robust Watermarks in Radiance Fields for Protection of Copyrights
Youngdong Jang, Dong In Lee, MinHyuk Jang, Jong Wook Kim, Feng Yang, Sangpil Kim
MoReVQA: Exploring Modular Reasoning Models for Video Question Answering
Juhong Min*, Shyamal Buch, Arsha Nagrani, Minsu Cho, Cordelia Schmid
HiPose: Hierarchical Binary Surface Encoding and Correspondence Pruning for RGB-D 6DoF Object Pose Estimation
Yongliang Lin, Yongzhi Su, Praveen Nathan, Sandeep Inuganti, Yan Di, Martin Sundermeyer, Fabian Manhardt, Didier Stricker, Jason Rambach, Yu Zhang
VideoCon: Robust Video-Language Alignment via Contrast Captions
Hritik Bansal, Yonatan Bitton, Idan Szpektor, Kai-Wei Chang, Aditya Grover
CLIP as RNN: Segment Countless Visual Concepts Without Training Endeavor
Shuyang Sun*, Runjia Li, Philip Torr, Xiuye Gu, Siyang Li
De-Diffusion Makes Text a Strong Cross-Modal Interface
Chen Wei, Chenxi Liu, Siyuan Qiao, Zhishuai Zhang, Alan Yuille, Jiahui Yu
Frozen Feature Augmentation for Few-Shot Image Classification
Andreas Bär*, Neil Houlsby, Mostafa Dehghani, Manoj Kumar
Streaming Dense Video Captioning
Xingyi Zhou, Anurag Arnab, Shyamal Buch, Shen Yan, Austin Myers, Xuehan Xiong, Arsha Nagrani, Cordelia Schmid
End-to-End Spatio-Temporal Action Localisation with Video Transformers
Alexey Gritsenko, Xuehan Xiong, Josip Djolonga, Mostafa Dehghani, Chen Sun, Mario Lučić, Cordelia Schmid, Anurag Arnab
VicTR: Video-Conditioned Text Representations for Activity Recognition
Kumara Kahatapitiya*, Anurag Arnab, Arsha Nagrani, Michael S Ryoo
PikeLPN: Mitigating Overlooked Inefficiencies of Low-Precision Neural Networks
Marina Neseem*, Conor McCullough, Randy Hsin, Chas Leichner, Shan Li, In Suk Chong, Andrew Howard, Lukasz Lew, Sherief Reda, Ville-Mikko Rautio, Daniele Moro
PTT: Point-Trajectory Transformer for Efficient Temporal 3D Object Detection
Kuan-Chih Huang, Weijie Lyu, Ming-Hsuan Yang, Yi-Hsuan Tsai
VideoGrounding-DINO: Towards Open-Vocabulary SpatioTemporal Video Grounding
Syed Talal Wasim, Muzammal Naseer, Salman Khan, Ming-Hsuan Yang, Fahad Khan
Learning Correlation Structures for Vision Transformers
Manjin Kim, Paul Hongsuck Seo, Cordelia Schmid, Minsu Cho
Learning Vision from Models Rivals Learning Vision from Data
Yonglong Tian, Lijie Fan*, Kaifeng Chen, Dina Katabi, Dilip Krishnan, Phillip Isola
Action-Slot: Visual Action-Centric Representations for Multi-label Atomic Activity Recognition in Traffic Scenes
Chi-Hsi Kung, Shu-Wei Lu, Yi-Hsuan Tsai, Yi-Ting Chen
A Generative Approach for Wikipedia-Scale Visual Entity Recognition
Mathilde Caron, Ahmet Iscen, Alireza Fathi, Cordelia Schmid
Modeling Collaborator: Enabling Subjective Vision Classification with Minimal Human Effort via LLM Tool-Use
Imad Eddine Toubal*, Aditya Avinash, Neil Gordon Alldrin, Jan Dlabal, Wenlei Zhou, Enming Luo, Otilia Stretcu, Hao Xiong, Chun-Ta Lu, Howard Zhou, Ranjay Krishna*, Ariel Fuxman, Tom Duerig
Unsupervised Key Points from Pre-trained Diffusion Models
Eric Hedlin, Gopal Sharma, Shweta Mahajan, Hossam Isak, Abishek Kar, Helge Rhodin, Andrea Tagliasacchi, Kwang Moo Yi
Bayes' Rays: Uncertainty Quantification for Neural Radiance Fields
Lily Goli, Cody Reading, Silvia Sellan, Alec Jacobson, Andrea Tagliasacchi
NeRFiller: Completing Scenes via Generative 3D Inpainting
Ethan Weber, Aleksander Holynski, Varun Jampani, Saurabh Saxena, Noah Snavely, Abhishek Kar, Angjoo Kanazawa
SODA: Bottleneck Diffusion Models for Representation Learning
Drew A. Hudson*, Daniel Zoran, Mateusz Malinowski, Andrew K. Lampinen, Andrew Jaegle, James L. McClelland, Loic Matthey, Felix Hill, Alexander Lerchner
Accelerating Neural Field Training via Soft Mining
Shakiba Kheradmand, Daniel Rebain, Gopal Sharma, Hossam Isack, Abishek Kar, Andrea Tagliasacchi, Kwang Moo Yi
BANF: Band-Limited Neural Fields for Levels of Detail Reconstruction
Ahan Shabanov, Shrisudhan Govindarajan, Cody Reading, Daniel Rebain, Kwang Moo Yi, AndreaTagliasacchi
ExtraNeRF: Visibility-Aware View Extrapolation of Neural Radiance Fields with Diffusion Models
Meng-Li Shih, Wei-Chiu Ma, Lorenzo Boyice, Aleksander Holynski, Forrester Cole, Brian Curless, Janne Kontkanen
KP-RED: Exploiting Semantic Keypoints for Joint 3D Shape Retrieval and Deformation
Ruida Zhang, Chenyangguang Zhang, Yan Di, Fabian Manhardt, Xingyu Liu, Federico Tombari, Xiangyang Ji
OmniGlue: Generalizable Feature Matching with Foundation Model Guidance
Hanwen Jiang*, Arjun Karpur, Bingyi Cao, Qixing Huang, André Araujo
DrivingGaussian: Composite Gaussian Splatting for Surrounding Dynamic Autonomous Driving Scenes
Xiaoyu Zhou, Zhiwei Lin, Xiaojun Shan, Yongtao Wang, Deqing Sun, Ming-Hsuan Yang
FutureHuman3D: Forecasting Complex Long-Term 3D Human Behavior from Video Observations
Christian Diller, Thomas Funkhouser, Angela Dai
ReconFusion: 3D Reconstruction with Diffusion Priors
Rundi Wu, Ben Mildenhall, Philipp Henzler, Keunhong Park, Ruiqi Gao, Daniel Watson, Pratul P. Srinivasan, Dor Verbin, Jonathan T. Barron, Ben Poole, Aleksander Holynski
Probing the 3D Awareness of Visual Foundation Models
Mohamed El Banani, Amit Raj, Kevis-Kokitsi Maninis, Abhishek Kar, Yuanzhen Li, Michael Rubinstein, Deqing Sun, Leonidas Guibas, Justin Johnson, Varun Jampani
DL3DV-10K: A Large-Scale Scene Dataset for Deep Learning-Based 3D Vision
Lu Ling, Yichen Sheng, Zhi Tu, Wentian Zhao, Cheng Xin, Kun Wan, Lantao Yu, Qianyu Guo, Zixun Yu, Yawen Lu, Xuanmao Li, Xingpeng Sun, Rohan Ashok, Aniruddha Mukherjee, Hao Kang, Xiangrui Kong, Gang Hua, Tianyi Zhang, Bedrich Benes, Aniket Bera
Point-VOS: Pointing Up Video Object Segmentation
Idil Esen Zulfikar, Sabarinath Mahadevan, Paul Voigtlaender, Bastian Leibe
NEAT: Distilling 3D Wireframes from Neural Attraction Fields
Nan Xue, Bin Tan, Yuxi Xiao, Liang Dong, Gui-Song Xia, Tianfu Wu, Yujun Shen
SHINOBI: Shape and Illumination Using Neural Object Decomposition via BRDF Optimization In-the-Wild
Andreas Engelhardt*, Amit Raj, Abhishek Kar, Yuanzhen Li, Deqing Sun, Mark Boss, Yunzhi Zhang*, Ricardo Martin Brualla, Jonathan T. Barron, Hendrik P. A. Lensch, Varun Jampani
Learning from One Continuous Video Stream
João Carreira, Michael King, Viorica Pătrăucean, Dilara Gokay, Cătălin Ionescu, Yi Yang, Daniel Zoran, Joseph Heyward, Carl Doersch, Yusuf Aytar, Dima Damen, Andrew Zisserman
Mirasol3B: A Multimodal Autoregressive Model for Time-Aligned and Contextual Modalities
AJ Piergiovanni, Isaac Noble, Dahun Kim, Michael Ryoo, Victor Gomes, Anelia Angelova
Perturbing Attention Gives You More Bang for the Buck: Subtle Imaging Perturbations That Efficiently Fool Customized Diffusion Models
Jingyao Xu, Siyang Lu, Yuetong Lu, Dongdong Wang, Yandong Li, Xiang Wei
Single-View Refractive Index Tomography with Neural Fields
Brandon Zhao, Aviad Levis, Liam Connor, Pratul P. Srinivasan, Katherine L. Bouman
Initialization Matters for Adversarial Transfer Learning
Andong Hua, Jindong Gu, Zhiyu Xue, Nicholas Carlini, Eric Wong, Yao Qin
Improving Generalization via Meta-Learning on Hard Samples
Nishant Jain, Arun S. Suggala, Pradeep Shenoy
Tutorials & demos
-
Mon, Jun 17 | 9:00AM
Machine Unlearning in Computer Vision: Foundations and ApplicationsSpeaker: Eleni Triantafillou
-
Tues, Jun 18 | 9:00AM
Contactless AI Healthcare Using Cameras and Wireless SensorsSpeaker: Daniel McDuff
-
Wed, Jun 19 | 10:30AM — 12:00PM
Magic Mapping: Interactive Segmentation of Satellite Imagery with Embedding FieldsAuthors: Chris Brown, Sean Askay, Michal Kazmierski, William Rucklidge, Valerie Pasquarella, Evan Shelhamer
Presenter: Evan Shelhamer
Workshops
-
Mon, Jun 17 | 8:30AM — 5:30PM
Adversarial Machine Learning on Computer Vision: Robustness of Foundation ModelsOrganizer: Xinyun Chen
-
Mon, Jun 17 | 8:30AM — 5:30PM
AI for Content Creation (AI4CC)Speaker: Noah Snavely
-
Mon, Jun 17 | 8:30AM — 5:30PM
AI for 3D GenerationSpeaker: Aleksander Hołyński
-
Mon, Jun 17 | 8:30AM — 5:30PM
AIS: Vision, Graphics and AI for StreamingSpeaker: Lucas Theis, Kelvin Chan
-
Mon, Jun 17 | 9:00AM — 5:00PM
Causal and Object-Centric Representations for RoboticsSpeaker: Thomas Kipf
-
Mon, Jun 17 | 8:00AM–12:45PM
Computer Vision for Fashion, Art, and DesignSpeaker: Thomas Kipf
-
Mon, Jun 17 | 8:30AM — 5:30PM
Computer Vision for Mixed RealityOrganizer: Andrea Colaco
Speaker: Federico Tombari -
Mon, Jun 17 | 8:30AM — 5:30PM
Computer Vision in the WildOrganizer: Yonatan Bitton
-
Mon, Jun 17 | 12:45PM — 6:05PM
CV4Animals: Computer Vision for Animal Behavior Tracking and ModelingSpeaker: Jennifer Sun
-
Mon, Jun 17| 8:30AM — 5:30PM
CV 20/20: A Retrospective VisionSpeakers: Dima Damen, Noah Snavely
-
Mon, Jun 17 | 9:00AM — 5:30PM
Dataset Distillation for Computer VisionSpeakers: Cho-Jui Hsieh, Zhiwei Deng
-
Mon, Jun 17 | 9:00AM — 5:30PM
EarthVision: Large Scale Computer Vision for Remote Sensing ImagerySpeaker: Dan Morris
-
Mon, Jun 17 | 8:30AM — 5:30PM
Efficient and On-Device Generation (EDGE)Speaker: Tim Salimans
-
Mon, Jun 17 | 8:00AM — 12:35PM
Efficient Large Vision ModelsOrganizer: Chuo-Ling Chang
Speakers: Miki Rubinstein, Tim Salimans
-
Mon, Jun 17 | 1:30PM — 5:30PM
Ethical Considerations in Creative Applications of Computer VisionOrganizers: Remi Denton, Negar Rostamzadeh, Andrew Smart, Cindy Bennett
Speaker: Renee Shelby -
Mon, Jun 17 | 8:00AM — 12:00PM
Face Anti-SpoofingOrganizer: Isabelle Guyon
-
Mon, Jun 17 | 8:30AM — 5:30PM
Federated Learning for Computer Vision (FedVision-2024)Speaker: Peter Kairouz
-
Mon, Jun 17 | 9:00AM — 6:00PM
Foundation Models for Autonomous SystemsSpeakers: Sherry Yang, Ted Xiao
-
Mon, Jun 17 | 8:30AM — 6:00PM
Foundation Models for Medical VisionSpeaker: Shek Azizi
-
Mon, Jun 17 | 1:00PM — 5:30PM
GenAI Media Generation Challenge for Computer Vision WorkshopSpeaker: Yuanzhen Li
-
Mon, Jun 17 | 1:00PM — 5:45PM
Image Matching: Local Features and BeyondOrganizer: Noah Snavely
Speaker: Eduard Trulls -
Mon, Jun 17 | 8:30AM — 6:15PM
Joint Egocentric Vision (EgoVis) WorkshopCo-organizer: Dima Damen
-
Mon, Jun 17 | 8:30AM — 12:00PM
Large Scale Holistic Video UnderstandingOrganizers: David Ross, Joao Carreira, Shyamal Buch
-
Mon, Jun 17 | 8:45AM — 5:30PM
Learning 3D with Multi-View Supervision (3DMV)Speaker: Andrea Tagliasacchi
-
Mon, Jun 17 | 8:30AM — 5:30PM
Long-Form Video Understanding: Towards Multimodal AI Assistant and CopilotSpeaker: Dima Damen
-
Mon, Jun 17 | 8:25AM — 12:15PM
Multimodal Algorithmic Reasoning WorkshopKeynote Speakers: Petar Veličković, Pushmeet Kohli
-
Mon, Jun 17 | 8:30AM — 6:00PM
Multimodal Content Moderation (MMCM)Speaker: Susanna Ricco
-
Mon, Jun 17 | 8:30AM — 6:00PM
New Trends in Image Restoration and Enhancement Workshop and ChallengesOrganizer: Ming-Hsuan Yang
Speaker: Aleksander Hołyński -
Mon, Jun 17 | 1:00PM — 5:45PM
Populating Empty Cities – Virtual Humans for Robotics and Autonomous DrivingSpeaker: Steve Seitz
-
Mon, Jun 17 | 9:00AM — 5:30PM
Prompting in VisionSpeaker: Ivana Balazevic
-
Mon, Jun 17 | 1:20PM — 6:00PM
Rhobin 2024: Rhobin Challenge on Reconstruction of Human-Object InteractionSpeaker: Dima Damen
-
Mon, Jun 17 | 9:00AM — 6:00PM
Sight and SoundOrganizers: Bill Freeman, Arsha Nagrani
-
Mon, Jun 17 | 8:25AM — 12:35PM
SyntaGen: Harnessing Generative Models for Synthetic Visual DatasetsSpeakers: David Fleet, Tali Dekel
-
Mon, Jun 17 | 8:45AM — 12:45PM
Tool-Augmented VisionOrganizers: Ahmet Iscen, Ziniu Hu, Mathilde Caron, Alireza Fathi
Speaker: Cordelia Schmid
-
Mon, Jun 17 | 8:30AM — 5:30PM
Urban Scene Modeling: Where Vision Meets Photogrammetry and GraphicsKeynote: Noah Snavely
-
Mon, Jun 17 | 8:30AM — 5:30PM
VAND 2.0: Visual Anomaly and Novelty DetectionOrganizer: Yedid Hoshen
-
Mon, Jun 17 | 8:30AM — 5:30PM
ViLMa – Visual Localization and MappingOrganizer: Dima Damen
-
Mon, Jun 17 | 8:30AM — 5:30PM
The Fifth Workshop on Fair, Data-efficient, and Trusted Computer VisionInvited Speaker: Yu-Chuan Su
-
Mon, Jun 17 | 8:30AM — 12:00PM
2nd Workshop on Scene Graphs and Graph Representation LearningOrganizer: Federico Tombari
Invited Speaker: Bryan Perozzi -
Mon, Jun 17 | 1:30PM — 6:00PM
Virtual Try-OnKeynote Speaker: Ira Kemelmacher-Shlizerman
-
Tue, Jun 18 | 8:30AM — 12:30PM
Advances in Radiance Fields for the MetaverseSpeakers: Jon Barron, Peter Hedman, George Kopanas
-
Tue, Jun 18 | 9:00AM — 5:00PM
Challenge on Computer Vision in the Built Environment for the Design, Construction, and Operation of BuildingsKeynote: Francis Engelmann
-
Tue, Jun 18 | 1:30PM — 6:00PM
Competition on Affective Behavior Analysis in-the-WildOrganizer: Stefanos Zafeiriou
-
Tue, Jun 18 | 8:30AM — 5:45PM
Computer Vision with Humans in the LoopSpeaker: Ranjay Krishna
-
Tue, Jun 18 | 8:30AM — 5:30PM
Continual Learning in Computer Vision (CLVISION)Speaker: Amal Rannen-Triki
-
Tue, Jun 18 | 8:50AM — 6:05PM
Efficient Deep Learning for Computer VisionOrganizers: Andrew Howard, Chas Leichner
-
Tue, Jun 18 | 8:30AM — 5:30PM
Equivariant Vision: From Theory to PracticeOrganizer: Ameesh Makadia Speaker: Carlos Esteves
-
Tue, Jun 18 | 8:20AM — 5:40PM
The Future of Generative Visual ArtOrganizers: Jon Barron, Aleksander Hołyński
Keynotes: Jimmy Shi, Dumitru Erhan, Jack Parker-Holder, Jon Barron -
Tue, Jun 18 |: 8:45AM — 4:45PM
Fine-Grained Visual CategorizationOrganizers: Jennifer Sun, Kimberly Wilber
Panelist: Bill Freeman -
Tue, Jun 18 | 8:30AM — 12:30PM
Gaze Estimation and Prediction in the WildOrganizer: Thabo Beeler
-
Tue, Jun 18 | 8:30AM — 5:30PM
Generative Models for Computer VisionOrganizers: Michael Niemeyer, Michael Oechsle
Speaker: Federico Tombari -
Tue, Jun 18 | 1:00PM — 6:30PM
Implicit Neural Representation for VisionSpeaker: Hyunjik Kim
-
Tue, Jun 18 | 1:30PM — 6:00PM
Learning from Procedural Videos and Language: What is Next?Organizer: Tengda Han
Speakers: Cordelia Schmid, Dima Damen, Antoine Miech -
Tue, Jun 18 | 9:00AM — 6:10PM
Learning with Limited Labelled Data for Image and Video UnderstandingSpeakers: Eleni Triantafillou, Ming-Hsuan Yang
-
Tue, Jun 18 | 9:00AM — 6:00PM
Multimodal Learning and ApplicationsKeynote: Dima Damen
-
Tue, Jun 18 | 9:00AM — 5:00PM
New Frontiers for Zero-Shot Image Captioning Evaluation (NICE)Speaker: Cordelia Schmid
-
Tue, Jun 18 | 1:30PM — 5:30PM
OpenSUN3D: Open-Vocabulary 3D Scene UnderstandingOrganizers: Francis Engelmann, Johanna Wald, Federico Tombari
-
Tue, Jun 18 | 1:30PM — 5:30PM
Precognition: Seeing Through the FutureOrganizer: Utsav Prabhu
-
Tue, Jun 18 | 8:30AM — 12:30PM
Responsible Generative AIOrganizer: Negar Rostamzadeh, Utsav Prabhu
-
Tue, Jun 18 | 8:30AM — 5:15PM
Responsible DataOrganizer: Candice Schumann, Susanna Ricco, Courtney Heldreth, Biao Wang
-
Tue, Jun 18 | 8:30AM — 5:15PM
Safe Artificial Intelligence for All Domains (SAIAD)Speaker: Been Kim
-
Tue, Jun 18 | 8:50AM — 12:30PM
ScanNet++ Novel View Synthesis and 3D Semantic Understanding ChallengeSpeakers: Federico Tombari, Jon Barron
-
Tue, Jun 18 | 8:30AM — 5:30PM
Synthetic Data for Computer VisionOrganizer: Ranjay Krishna
-
Tue, Jun 18 | 8:30AM — 12:40PM
Test-Time Adaptation: Model, Adapt Thyself! (MAT)Lead Organizer: Evan Shelhamer
-
Tue, Jun 18 | 1:00PM — 6:30PM
The Evaluation of Generative Foundation ModelsSpeakers: Sadeep Jayasumana, Ranjay Krishna
-
Tue, Jun 18 | 7:50AM — 6:00PM
Transformers for VisionOrganizer: Lucas Beyer
Speakers: Hila Chefer, Antoine Miech Planelist: Hila Chefer -
Tue, Jun 18 | 8:45AM — 5:00PM
Vision Datasets Understanding and DataCV ChallengeOrganizer: José Lezama
-
Tue, Jun 18 | 9:00AM — 6:00PM
Vision and Language for Autonomous Driving and Robotics (VLADR)Speaker: Fei Xia
-
Tues, Jun 18 | 8:00AM — 12:05PM
VizWiz Grand Challenge: Describing Images and Videos Taken by Blind PeopleSpeaker: Beer Changpinyo
-
Tue, Jun 18 | 8:30AM — 1:00PM
What is Next in Multimodal Foundation Models?Speaker: Cordelia Schmid
-
Tue, Jun 18 | 8:30AM — 1:30PM
Women in Computer VisionSpeaker/Panelist: Shek Azizi
-
Tue, Jun 18 | 8:30AM — 1:30PM
Platinum sponsor: LatinX in CV (LXCV)
Google Booth Demo/Q&A Schedule
*Dates and times may be subject to change. Stop by the Google booth (#1725) for more details.
-
Wednesday, June 19 | 12:30PM — 1:00PM
Medical AI Research at Google DeepMindPresenter: Shek Azizi
-
Wednesday, June 19 | 1:30PM — 2:00PM
TacticAI: an AI Assistant for Football TacticsPresenter: Petar Veličković
-
Wednesday, June 19 | 2:30PM — 3:00PM
On-device Image Generation and EditingPresenters: Yang Zhao, Karthik Raveendran, Matthias Grundmann, Zhisheng Xiao
-
Wednesday, June 19 | 3:45PM — 4:10PM
Open Set 3D Scene UnderstandingPresenters: Federico Tombari, Francis Engelmann
-
Thursday, June 20 | 10:00AM — 11:00AM
Recruiting at Google Q&APresenter: Rachel Dean
-
Thursday, June 20 | 12:30PM — 1:00PM
HyperDreamBooth: HyperNetworks for Fast Personalization of Text-to-Image ModelsPresenter: Nataniel Ruiz
-
Thursday, June 20 | 1:30PM — 2:00PM
Genie: Generative Interactive EnvironmentsPresenters: Jack Parker-Holder, Jimmy Shi
-
Thursday, June 20 | 2:30PM — 3:00PM
PaliGemmaPresenters: Lucas Beyer, Xiaohua Zhai, Alexander Kolesniko
-
Thursday, June 20 | 3:45 PM — 4:10PM
VideoCon: Robust Video-Language Alignment via Contrast CaptionsPresenters: Yonatan Bitton, Hritik Bansal
-
Friday, June 21 | 2:30 PM — 3:00 PM
Interactive Mapping of the EarthPresenter: Evan Shelhamer
* Work done while at Google