Google at ECCV 2024

September 29, 2024 to October 4, 2024 • Milan, Italy

Google at ECCV 2024

Google Research is proud to be a Diamond Sponsor of the European Conference on Computer Vision (ECCV 2024), a biennial premier research conference in Computer Vision and Machine Learning. ECCV 2024 is being held Sunday, September 29th through Friday, October 4th in Milan, Italy. Google has a strong presence at this year’s conference with over 70 accepted papers and active involvement in over 29 workshops and tutorials. We look forward to sharing some of our extensive research and expanding our partnership with the broader computer vision research community.

Attending ECCV 2024? Be sure to visit the Google booth to chat with researchers who are actively pursuing the latest innovations in computer vision, and check out some of the scheduled booth activities (e.g., demos and Q&A sessions listed below). Visit the @GoogleAI X and Google Research LinkedIn accounts to find out more about the Google booth activities at ECCV 2024.

Take a look below to learn more about Google's technical participation at ECCV 2024 (Google affiliations in bold).

All session times are provided in CEST.

Quick links

LinkedIn
X
- ×

Quick links

LinkedIn
X
- ×

Board & Organizing Committee

Remi Denton
- Ethics Review Committee
Jordi Pont-Tuset
- Workshops and Tutorial Chair & Area Chair
Ahmet Iscen
- Area Chair
Aishwarya Agrawal
- Area Chair
Alireza Fathi
- Area Chair
Andre Araujo
- Area Chair
Andrew Zisserman
- Area Chair
Angela Yao
- Area Chair
Arsha Nagrani
- Area Chair
Ayan Chakrabarti
- Area Chair
Bernt Schiele
- Area Chair
Cordelia Schmid
- Area Chair
Dan Xu
- Area Chair
Daniel Zoran
- Area Chair
Deqing Sun
- Area Chair
Dima Damen
- Area Chair
Du Tran
- Area Chair
Evan Shelhamer
- Area Chair
Federico Tombari
- Area Chair
Golnaz Ghiasi
- Area Chair
Joao Carreira
- Area Chair
Junhwa Hur
- Area Chair
Kenneth Marino
- Area Chair
Kevis-Kokitsi Maninis
- Area Chair
Krishna Kumar Singh
- Area Chair
Liang-Chieh Chen
- Area Chair
Long Chen
- Area Chair
Mei Chen
- Area Chair
Michael Niemeyer
- Area Chair
Michael Rubinstein
- Area Chair
Ming-Hsuan Yang
- Area Chair
Negar Rostamzadeh
- Area Chair
Olivia Wiles
- Area Chair
Richard Zhang
- Area Chair
Rodrigo Benenson
- Area Chair
Ryan Farrell
- Area Chair
Saining Xie
- Area Chair
Sayna Ebrahimi
- Area Chair
Tali Dekel
- Area Chair
Tatsuya Harada
- Area Chair
Thomas Mensink
- Area Chair
Timo Bolkart
- Area Chair
Vignesh Ramanathan
- Area Chair
Xiaoyu Wang
- Area Chair
Ying Wu
- Area Chair

Orals

Accepted papers

Diffusion Bridges for 3D Point Cloud Denoising
Mathias Vogel Hüni, Keisuke Tateno, Marc Pollefeys, Federico Tombari, Marie-Julie Rakotosaona, Francis Engelmann

D-SCo: Dual-Stream Conditional Diffusion for Monocular Hand-Held Object Reconstruction
Bowen Fu, Gu Wang, Chenyangguang Zhang, Yan Di, Ziqin Huang, Zhiying Leng, Fabian Manhardt, Xiangyang Ji, Federico Tombari

HaloQuest: A Visual Hallucination Dataset for Advancing Multimodal Reasoning
Zhecan Wang*, Garrett Bingham, Adams Wei Yu, Quoc V. Le, Thang Luong, Golnaz Ghiasi

MagicMirror: Fast and High-Quality Avatar Generation with Constrained Search Space
Armand Comas, Di Qiu, Menglei Chai, Marcel C. Bühler, Amit Raj, Ruiqi Gao, Qiangeng Xu, Mark J Matthews, Paulo Gotardo, Octavia Camps, Sergio Orts-Escolano, Thabo Beeler

Mismatch Quest: Visual and Textual Feedback for Image-Text Misalignment
Brian Gordon, Yonatan Bitton, Yonatan Shafir, Roopal Garg, Xi Chen, Dani Lischinski, Daniel Cohen-Or, Idan Szpektor

Nuvo: Neural UV Mapping for Unruly 3D Representations
Pratul Srinivasan, Stephan Garbin, Dor Verbin, Jonathan Barron, Ben Mildenhall

ReNoise: Real Image Inversion Through Iterative Noising
Daniel Garibi, Or Patashnik, Andrey Voynov, Hadar Averbuch-Elor, Danny Cohen-Or

Spatial-Temporal Multi-level Association for Video Object Segmentation
Deshui Miao, Xin Li, Zhenyu He, Huchuan Lu, Ming-Hsuan Yang

Text-Conditioned Resampler for Long Form Video Understanding
Bruno Korbar, Yongqin Xian, Alessio Tonioni, Andrew Zisserman, Federico Tombari

ViC-MAE: Self-Supervised Representation Learning from Images and Video with Contrastive Masked Autoencoders
Jefferson Hernandez, Vicente Ordonez, Ruben Villegas

WordRobe: Text-Guided Generation of Textured 3D Garments
Astitva Srivastava, Pranav Manu, Amit Raj, Varun Jampani, Avinash Sharma

Bad Students Make Great Teachers: Active Learning Accelerates Large-Scale Visual Understanding
Talfan Evans, Shreya Pathak, Hamza Merzic, Jonathan Richard Schwarz*, Ryutaro Tanno, Olivier Henaff

Geometry Fidelity for Spherical Images
Anders Christensen*, Nooshin Mojab, Khushman Patel, Karan Ahuja, Zeynep Akata, Ole Winther, Mar Gonzalez Franco, Andrea Colaco

LookupViT: Compressing Visual Information to a Limited Number of Tokens
Rajat Koner, Gagan Jain, Prateek Jain, Volker Tresp, Sujoy Paul

MobileDiffusion: Instant Text-to-Image Generation on Mobile Devices (see blog post)
Yang Zhao, Zhisheng Xiao, Yanwu Xu, Haolin Jia, Tingbo Hou

MVDD: Multi-View Depth Diffusion Models
Zhen Wang*, Qiangeng Xu, Feitong Tan, Menglei Chai, Shichen Liu, Rohit Pandey, Sean Fanello, Achuta Kadambi, Yinda Zhang

Optimizing Illuminant Estimation in Dual-Exposure HDR Imaging
Mahmoud Afifi, Zhenhua Hu, Liang Liang

PhysAvatar: Learning the Physics of Dressed 3D Avatars from Visual Observations
Yang Zheng, Qingqing Zhao, Guandao Yang, Wang Yifan, Donglai Xiang, Florian Dubost, Dmitry Lagun, Thabo Beeler, Federico Tombari, Leonidas Guibas, Gordon Wetzstein

PointNeRF++: A Multi-Scale, Point-Based Neural Radiance Field
Weiwei Sun, Eduard Trulls, Yang-Che Tseng, Sneha Sambandam, Gopal Sharma, Andrea Tagliasacchi, Kwang Moo Yi

Region-Centric Image-Language Pretraining for Open-Vocabulary Detection
Dahun Kim, Anelia Angelova, Weicheng Kuo

ArtVLM: Attribute Recognition Through Vision-Based Prefix Language Modeling
William Yicheng Zhu, Keren Ye, Junjie Ke, Jiahui Yu*, Leonidas Guibas, Peyman Milanfar, Feng Yang

3D Congealing: 3D-Aware Image Alignment in the Wild
Yunzhi Zhang, Zizhang Li, Amit Raj, Andreas Engelhardt, Yuanzhen Li, Tingbo Hou, Jiajun Wu, Varun Jampani

Curved Diffusion: A Generative Model With Optical Geometry Control
Andrey Voynov, Amir Hertz, Moab Arar*, Shlomi Fruchter, Daniel Cohen-Or*

GIVT: Generative Infinite-Vocabulary Transformers
Michael Tschannen, Cian Eastwood*, Fabian Mentzer

IG Captioner: Information Gain Captioners are Strong Zero-shot Classifiers
Chenglin Yang, Siyuan Qiao, Yuan Cao, Yu Zhang, Tao Zhu, Alan Yuille, Jiahui Yu

Improving 2D Feature Representations by 3D-Aware Fine-Tuning
Yuanwen Yue, Anurag Das, Francis Engelmann, Siyu Tang, Jan Eric Lenssen

SemiVL: Semi-Supervised Semantic Segmentation with Vision-Language Guidance
Lukas Hoyer*, David Joseph Tan, Muhammad Ferjad Naeem, Luc Van Gool, Federico Tombari

SMooDi: Stylized Motion Diffusion Model
Lei Zhong, Yiming Xie, Varun Jampani, Deqing Sun, Huaizu Jiang

Volumetric Rendering with Baked Quadrature Fields
Gopal Sharma, Daniel Rebain, Kwang Moo Yi, Andrea Tagliasacchi

Optimizing Factorized Encoder Models: Time and Memory Reduction for Scalable and Efficient Action Recognition
Shreyank N Gowda*, Anurag Arnab, Jonathan Huang

SceneGraphLoc: Cross-Modal Coarse Visual Localization on 3D Scene Graphs
Yang Miao, Francis Engelmann, Olga Vysotska, Federico Tombari, Marc Pollefeys, Daniel Barath

Scene-Graph ViT: End-to-End Open-Vocabulary Visual Relationship Detection
Tim Salzmann, Markus Ryll, Alex Bewley, Matthias Minderer

SPIRE: Semantic Prompt-Driven Image Restoration
Chenyang QI*, Zhengzhong Tu, Keren Ye, Mauricio Delbracio, Peyman Milanfar, Qifeng Chen, Hossein Talebi

When and How do Negative Prompts Take Effect?
Yuanhao Ban, Ruochen Wang, Tianyi Zhou, Minhao Cheng, Boqing Gong, Cho-Jui Hsieh

AMES: Asymmetric and Memory-Efficient Similarity Estimation for Instance-Level Retrieval
Pavel Suma, Giorgos Kordopatis-Zilos, Ahmet Iscen, Giorgos Tolias

EchoScene: Indoor Scene Generation via Information Echo Over Scene Graph Diffusion
Guangyao Zhai, Evin Pınar Örnek, Dave Zhenyu Chen, Ruotong Liao, Yan Di, Nassir Navab, Federico Tombari, Benjamin Busam

Finding NeMo: Negative-Mined Mosaic Augmentation for Referring Image Segmentation
Seongsu Ha, Chaeyun Kim, Donghwa Kim, Junho Lee, Sangho Lee, Joonseok Lee

Learned Neural Physics Simulation for Articulated 3D Human Pose Reconstruction
Mykhaylo Andriluka, Baruch Tabanpour, C. Daniel Freeman, Cristian Sminchisescu

NICP: Neural ICP for 3D Human Registration at Scale
Riccardo Marin, Enric Corona, Gerard Pons-Moll

Segment3D: Learning Fine-Grained Class-Agnostic 3D Segmentation Without Manual Labels
Rui Huang, Songyou Peng, Ayca Takmaz, Federico Tombari, Marc Pollefeys, Shiji Son, Gao Huang, Francis Engelmann

Weakly Supervised 3D Object Detection via Multi-level Visual Guidance
Kuan-Chih Huang, Yi-Hsuan Tsai, Ming-Hsuan Yang

ZipLoRA: Any Subject in Any Style by Effectively Merging LoRAs
Viraj Shah*, Nataniel Ruiz, Forrester Cole, Erika Lu, Svetlana Lazebnik, Yuanzhen Li, Varun Jampani

AdaDiff: Accelerating Diffusion Models Through Step-Wise Adaptive Computation
Shengkun Tang, Yaqing Wang, Caiwen Ding, Yi Liang, Yao Li, Dongkuan Xu

Affective Visual Dialog: A Large-Scale Benchmark for Emotional Reasoning Based on Visually Grounded Conversations
Kilichbek Haydarov, Xiaoqian Shen, Avinash Madasu, Mahmoud Salem, Li-Jia Li, Gamaleldin F Elsayed, Mohamed Elhoseiny

Chat-Edit-3D: Interactive 3D Scene Editing via Text Prompts
Shuangkang Fang, Yufeng Wang, Yi-Hsuan Tsai, Yi Yang, Wenrui Ding, Shuchang Zhou, Ming-Hsuan Yang

DOCCI: Descriptions of Connected and Contrasting Images
Yasumasa Onoe, Sunayana Rane*, Zachary E Berger, Yonatan Bitton, Jaemin Cho*, Roopal Garg, Alexander Ku, Zarana Parekh, Jordi Pont-Tuset, Garrett Tanzer, Su Wang, Jason M Baldridge

GeoGaussian: Geometry-Aware Gaussian Splatting for Scene Rendering
Yanyan Li, Chenyu Lyu, Yan Di, Guangyao Zhai, Gim Hee Lee, Federico Tombari

Improving Point-Based Crowd Counting and Localization Based on Auxiliary Point Guidance
I-HSIANG CHEN, Wei-Ting Chen, Yu-Wei Liu, Ming-Hsuan Yang, Sy-Yen Kuo

Instant 3D Human Avatar Generation Using Image Diffusion Models
Nikos Kolotouros, Thiemo Alldieck, Enric Corona, Eduard Gabriel Bazavan, Cristian Sminchisescu

Lagrangian Hashing for Compressed Neural Field Representations
Shrisudhan Govindarajan, Zeno Sambugaro, Ahan Shabhanov, Towaki Takikawa, Weiwei Sun, Daniel Rebain, Nicola Conci, Kwang Moo Yi, Andrea Tagliasacchi

Photorealistic Video Generation with Diffusion Models
Agrim Gupta*, Lijun Yu, Kihyuk Sohn, Xiuye Gu, Meera Hahn, Li Fei-Fei, Irfan Essa, Lu Jiang, Jose Lezama

3D Open-Vocabulary Panoptic Segmentation with 2D-3D Vision-Language Distillation
Zihao Xiao, Longlong Jing, Shangxuan Wu, Alex Zihao Zhu, Jingwei Ji, Chiyu Max Jiang, Wei-Chih Hung, Thomas Funkhouser, Weicheng Kuo, Anelia Angelova, Yin Zhou, Shiwei Sheng

SILC: Improving Vision Language Pre-training with Self-Distillation
Muhammad Ferjad Naeem*, Yongqin Xian, Xiaohua Zhai, Lukas Hoyer*, Luc Van Gool, Federico Tombari

TC4D: Trajectory-Conditioned Text-to-4D Generation
Sherwin Bahmani, Xian Liu, Yifan Wang, Ivan Skorokhodov, Victor Rong, Ziwei Liu, Xihui Liu, Jeong Joon Park, Sergey Tulyakov, Gordon Wetzstein, Andrea Tagliasacchi, David B. Lindell

WeConvene: Learned Image Compression with Wavelet-Domain Convolution and Entropy Model
Haisheng Fu, Jie Liang, Zhenman Fang, Jingning Han, Feng Liang, Guohe Zhang

Loc3Diff: Local Diffusion for 3D Human Head Synthesis and Editing
Yushi Lan*, Feitong Tan, Qiangeng Xu, Di Qiu, Kyle Genova, Zeng Huang, Rohit Pandey, Sean Fanello, Thomas Funkhouser, Chen Change Loy, Yinda Zhang

ObjectDrop: Bootstrapping Counterfactuals for Photorealistic Object Removal and Insertion
Daniel Winter, Matan Cohen, Shlomi Fruchter, Yael Pritch, Alex Rav-Acha, Yedid Hoshen

PALM: Predicting Actions Through Language Models
Sanghwan Kim, Daoji Huang, Yongqin Xian, Otmar Hilliges, Luc Van Gool, Xi Wang

Self-Supervised Shape Completion via Involution and Implicit Correspondences
Mengya Liu, Ajad Chhatkuli, Janis Postels, Luc Van Gool, Federico Tombari

Self-Training Room Layout via Geometry-Aware Ray-Casting
Bolivar Solarte, Chin-Hsuan Wu, Jin-Cheng Jhang, Jonathan Lee, Yi-Hsuan Tsai, Min Sun

Score Distillation Sampling with Learned Manifold Corrective
Thiemo Alldieck, Nikos Kolotouros, Cristian Sminchisescu

Taming CLIP for Fine-Grained and Structured Visual Understanding of Museum Exhibits
Ada-Astrid Balauca, Danda Pani Paudel, Kristina Toutanova, Luc Van Gool

Tree-D Fusion: Simulation-Ready Tree Dataset from Single Images with Diffusion Priors
Jae Joong Lee, Bosheng Li, Sara M Beery, Jonathan Huang, Songlin Fei, Raymond A. Yeh, Bedrich Benes

Workshops

Tutorials

Demos and Q&A at the Google Booth

*Dates and times may be subject to change. Stop by the Google booth (#41) for more details.

Tuesday, October 1 | 10:30AM - 11:00AM
Google DeepMind Media Generation

Miaosen Wang, Hang Qi, Chris Wolff, Siavash Khodadadeh, Abhishek Sharma, Norman Casagrande
Tuesday, October 1 | 4:30PM - 5:00PM
Q&A: Building a Career @ Google

Jason Zeidan, Daniel Trifunovich
Wednesday, October 2 | 10:30AM - 11:00AM
LookUpVit: Efficient/Flexible ViT

Rajat Koner, Sujoy Paul, Gagan Jain
Wednesday, October 2 | 12:30PM - 1:30PM
Meet the GDM TA team

Laura Giapino, Mike Carne
Wednesday, October 2 | 4:30PM - 5:00PM
Open Vocabulary 3D Scene Understanding

Francis Engelmann, Federico Tombari
Thursday, October 3 | 10:30AM - 11:00AM
PaliGemma: A Versatile 3B VLM

Andreas Steiner
Thursday, October 3 | 12:30PM - 1:30PM
Q&A: Building a Career @ Google

Jason Zeidan, Daniel Trifunovich
Thursday, October 3 | 4:30PM - 5:0PAM
Mismatch Quest: Visual and Textual Feedback for Image-Text Misalignment

Yonatan Bitton

* Work done while at Google

Defining the technology of today and tomorrow.

Philosophy

People

Research areas

Foundational ML & Algorithms

Computing Systems & Quantum AI

Science, AI & Society

Projects

Publications

Resources

Shaping the future, together.

Student programs

Faculty programs

Conferences & events

Google at ECCV 2024