
Google at CVPR 2025
Google at CVPR 2025
The 2025 meeting of the Computer Vision and Pattern Recognition conference (CVPR 2025) is being held Wednesday, June 11th through Sunday June 15th in Nashville. Google is proud to be a Platinum Sponsor of CVPR 2025, where researchers from Google Research, Google Deepmind and more will be contributing at all levels. This year we are presenting over 60 papers and are actively involved in a number of different events, including 36 workshops, 6 orals, and several in-booth demo sessions.
Attending CVPR 2025 in person? Stop by the Google booth (#1301) to learn more about how we’re actively exploring the latest machine learning techniques for application across the fields of computer vision and machine perception. Visit the @GoogleResearch X and Google Research LinkedIn accounts for announcements about Google booth activities (e.g., demos and Q&A sessions, which are also listed below).
Continue below to learn more about how Google researchers are engaged at CVPR 2025 (Google affiliations highlighted in bold).
All session times are provided in CST.
Demos and Q&A at the Google Booth
-
Fri, Jun 13 | 10:00AM — 11:00AM
Discover Android XRPresenter: Federico Tombari, Sean Fanello
-
Fri, Jun 13 | 11:30AM — 12:00PM
Discover Android XRPresenter: Federico Tombari, Sean Fanello
-
Fri, Jun 13 | 12:30PM — 1:00PM
RefVNLI: Towards Scalable Evaluation of Subject-Driven Text-to-Image GenerationPresenter: Yonatan Bitton
-
Fri, Jun 13 | 4:00PM — 4:30PM
Video Creation by DemonstrationPresenters: Ting Liu, Hao Zhou
-
Sat, Jun 14 | 10:00AM — 10:30AM
Unblocking Fine-Grained Evaluation of Detailed Captions: An Explaining AutoRater and Critic-and-Revise PipelinePresenter: Yonatan Bitton
-
Sat, Jun 14 | 11:30AM — 12:00PM
Discover Android XRPresenter: Federico Tombari, Sean Fanello
-
Sat, Jun 14 | 12:30PM — 1:00PM
Brush: Gaussian Splatting anywherePresenter: Arthur Brussee
-
Sat, Jun 14 | 4:30PM — 5:00PM
Scaling On-Device GPU Inference for Large Generative ModelsPresenter: Grant Jensen
-
Sun, Jun 15 | 10:00AM — 11:00AM
Discover Android XRPresenter: Federico Tombari, Sean Fanello
-
Sun, Jun 15 | 11:30AM - 12:00PM
Discover Android XRPresenter: Federico Tombari, Sean Fanello
Keynote
Gemini Robotics, Bringing AI to the Physical World
Speaker: Carolina Parada
Sun, Jun 15 | 2:45PM — 3:45PM, Karl Dean Ballroom
Award Candidates
MegaSaM: Accurate, Fast and Robust Structure and Motion from Casual Dynamic Videos
Zhengqi Li, Richard Tucker, Forrester Cole, Qianqian Wang, Linyi Jin, Vickie Ye, Angjoo Kanazawa, Aleksander Hołyński, Noah Snavely
Oral: Sat, Jun 14 | 9:00AM —9:15AM, Karl Dean Ballroom (Oral Session 3A: 3D Computer Vision)
Poster: Sat, Jun 14 | 10:30AM — 12:30PM, ExHall D (Poster Session 3, #78)
Tutorials
Orals
-
Fri, Jun 13 | 9:00AM — 9:15AM, Karl Dean Ballroom (Oral Session 1A: Image and Video Synthesis)
Motion Prompting: Controlling Video Generation with Motion TrajectoriesDaniel Geng*, Charles Herrmann, Junhwa Hur, Forrester Cole, Serena Zhang, Tobias Pfaff, Tatiana Lopez-Guevara, Carl Doersch, Yusuf Aytar, Michael Rubinstein, Chen Sun, Oliver Wang, Andrew Owens, Deqing Sun
-
Sat, Jun 14 | 9:15AM — 9:30AM, Karl Dean Ballroom (Oral Session 3A: 3D Computer Vision)
Stereo4D: Learning How Things Move in 3D from Internet Stereo VideosLinyi Jin, Richard Tucker, Zhengqi Li, David Fouhey, Noah Snavely, Aleksander Hołyński
-
Sat, Jun 14 | 9:30AM — 9:45AM, Karl Dean Ballroom (Oral Session 3A: 3D Computer Vision)
Continuous 3D Perception Model with Persistent StateQianqian Wang, Yifei Zhang, Aleksander Hołyński, Alexei A. Efros, Angjoo Kanazawa
-
Sat, Jun 14 | 1:15PM — 1:30PM, Karl Dean Ballroom (Oral Session 4A: Image and Video Synthesis)
Language-Guided Image Tokenization for GenerationKaiwen Zha*, Lijun Yu, Alireza Fathi, David A. Ross, Cordelia Schmid, Dina Katabi, Xiuye Gu
-
Sun, Jun 15 | 1:45PM — 2:00PM, Karl Dean Ballroom (Oral Session 6A: 3D from Single or Multi-View Sensors)
CAT4D: Create Anything in 4D with Multi-View Video Diffusion ModelsRundi Wu, Ruiqi Gao, Ben Poole, Alex Trevithick, Changxi Zheng, Jonathan T. Barron, Aleksander Hołyǹski
-
Sun, Jun 15 | 2:00PM — 2:15PM, Davidson Ballroom (Oral Session 6C: Video, Action, and Language)
SEAL: Semantic Attention Learning for Long Video RepresentationLan Wang*, Yujia Chen, Du Tran, Vishnu Naresh Boddeti, Wen-Sheng Chu
Accepted Papers & Highlights
3D-GSW: 3D Gaussian Splatting for Robust Watermarking
Youngdong Jang, Hyunje Park, Feng Yang, Heeju Ko, Euijin Choo, Sangpil Kim
A Bias-Free Training Paradigm for More General AI-Generated Image Detection
Fabrizio Guillaro, Giada Zingarini, Ben Usman, Avneesh Sud, Davide Cozzolino, Luisa Verdoliva
Active Data Curation Effectively Distills Large-Scale Multimodal Models
Vishaal Udandarao*, Nikhil Parthasarathy, Muhammad Ferjad Naeem, Talfan Evans, Samuel Albanie, Federico Tombari, Yongqin Xian, Alessio Tonioni, Olivier J. Henaff
AMO Sampler: Enhancing Text Rendering with Overshooting
Xixi Hu, Keyang Xu, Bo Liu, Qiang Liu, Hongliang Fei
BimArt: A Unified Approach for the Synthesis of 3D Bimanual Interaction with Articulated Objects
Wanyue Zhang, Rishabh Dabral, Vladislav Golyanik, Vasileios Choutas, Eduardo Alvarado, Thabo Beeler, Marc Habermann, Christian Theobalt
Calibrated Multi-Preference Optimization for Aligning Diffusion Models
Kyungmin Lee*, Xiaohang Li, Qifei Wang, Junfeng He, Junjie Ke, Ming-Hsuan Yang, Irfan Essa, Jinwoo Shin, Feng Yang, Yinxiao Li
Can Generative Video Models Help Pose Estimation?
Ruojin Cai, Jason Y. Zhang, Philipp Henzler, Zhengqi Li, Noah Snavely, Ricardo Martin
Chapter-Llama: Efficient Chaptering in Hour-Long Videos with LLMs
Lucas Ventura, Antoine Yang, Cordelia Schmid, Gul Varol
Coarse Correspondences Boost Spatial-Temporal Reasoning in Multimodal Language Model
Benlin Liu, Yuhao Dong, Yiqin Wang, Zixian Ma, Yansong Tang, Luming Tang, Yongming Rao, Wei-Chiu Ma, Ranjay Krishna
Context-Aware Multimodal Pretraining
Karsten Roth*, Zeynep Akata, Dima Damen, Ivana Balazevic, Olivier J Henaff
Cropper: Vision-Language Model for Image Cropping through In-Context Learning
Seung Hyun Lee*, Jijun Jiang, Yiran Xu*, Zhuofang Li, Junjie Ke, Yinxiao Li, Junfeng He, Steven Hickson, Katie Datsenko, Sangpil Kim, Ming-Hsuan Yang, Irfan Essa, Feng Yang
DynamicScaler: Seamless and Scalable Video Generation for Panoramic Scenes
Jinxiu Liu, Shaoheng Lin, Yinxiao Li, Ming-Hsuan Yang
Efficient Visual State Space Model for Image Deblurring
Lingshun Kong, Jiangxin Dong, Jinhui Tang, Ming-Hsuan Yang, Jinshan Pan
Ego4o: Egocentric Human Motion Capture and Understanding from Multi-Modal Input
Jian Wang, Rishabh Dabral, Diogo Luvizon, Zhe Cao, Lingjie Liu, Thabo Beeler, Christian Theobalt
ESCAPE: Equivariant Shape Completion via Anchor Point Encoding
Burak Bekci, Nassir Navab, Federico Tombari, Mahdi Saleh
FALCON: Fairness Learning via Contrastive Attention Approach to Continual Semantic Scene Understanding
Thanh-Dat Truong, Utsav Prabhu, Bhiksha Raj, Jackson Cothren, Khoa Luu
FirePlace: Geometric Refinements of LLM Common Sense Reasoning for 3D Object Placement
Ian Huang*, Yanan Bao, Karen Truong, Howard Zhou, Cordelia Schmid, Leonidas Guibas, Alireza Fathi
Flexible Frame Selection for Efficient Video Reasoning
Shyamal Buch, Arsha Nagrani, Anurag Arnab, Cordelia Schmid
Focus-N-Fix: Region-Aware Fine-Tuning for Text-to-Image Generation
Xiaoying Xing*, Avinab Saha*, Junfeng He, Susan Hao, Paul Vicol, Moonkyung Ryu, Gang Li, Sahil Singla, Sarah Young, Yinxiao Li, Feng Yang, Deepak Ramachandran
FRAME: Floor-Aligned Representation for Avatar Motion from Egocentric Video
Manfred Georg, Garrett Tanzer, Saad Hassan*, Maximus Shengelia*, Esha Uboweja, Sam Sepah, Sean Forbes, Thad Starner
FSboard: Over 3 million Characters of ASL Fingerspelling Collected via Smartphones
Andrea Boscolo Camiletto, Jian Wang, Eduardo Alvarado, Rishabh Dabral, Thabo Beeler, Marc Habermann, Christian Theobalt
Gaussian Eigen Models for Human Heads
Wojciech Zielonka, Timo Bolkart, Thabo Beeler, Justus Thies
Generative Multiview Relighting for 3D Reconstruction under Extreme Illumination Variation
Hadi Alzayer, Philipp Henzler, Jonathan T. Barron, Jia-Bin Huang, Pratul P. Srinivasan, Dor Verbin
Generative Omnimatte: Learning to Decompose Video into Layers
Yao-Chih Lee*, Erika Lu, Sarah Rumbley, Michal Geyer, Jia-Bin Huang, Tali Dekel, Forrester Cole
Good, Cheap, and Fast: Overfitted Image Compression with Wasserstein Distortion
Jona Ballé, Luca Versari, Emilien Dupont, Hyunjik Kim, Matthias Bauer
GroomLight: Hybrid Inverse Rendering for Relightable Human Hair Appearance Modeling
Yang Zheng, Menglei Chai, Delio Vicini, Yuxiao Zhou, Yinghao Xu, Leonidas Guibas, Gordon Wetzstein, Thabo Beeler
IM-Portrait: Learning 3D-Aware Video Diffusion for Photorealistic Talking Heads from Monocular Videos
Yuan Li*, Ziqian Bai, Feitong Tan, Zhaopeng Cui, Sean Fanello, Yinda Zhang
LayoutVLM: Differentiable Optimization of 3D Layout via Vision-Language Models
Fan-Yun Sun, Weiyu Liu, Siyi Gu, Dylan Lim, Goutam Bhat, Federico Tombari, Manling Li, Nick Haber, Jiajun Wu
Learning from Streaming Video with Orthogonal Gradients
Tengda Han, Dilara Gokay, Joseph Heyward, Chuhan Zhang, Daniel Zoran, Viorica Pǎtrǎucean, João Carreira, Dima Damen, Andrew Zisserman
Learning Visual Composition through Improved Semantic Guidance
Austin Stone, Hagen Soltau, Robert Geirhos, Xi Yi, Ye Xia, Bingyi Cao, Kaifeng Chen, Abhijit Ogale, Jonathon Shlens
Lessons and Insights from a Unifying Study of Parameter-Efficient Fine-Tuning (PEFT) in Visual Recognition
Zheda Mai, Ping Zhang, Cheng-Hao Tu, Hong-You Chen, Quang-Huy Nguyen, Li Zhang, Wei-Lun Chao
LOGICZSL: Exploring Logic-Induced Representation for Compositional Zero-Shot Learning
Peng Wu, Xiankai Lu, Hao Hu, Yongqin Xian, Jianbing Shen, Wenguan Wang
LoRACLR: Contrastive Adaptation for Customization of Diffusion Models Enis Simsar, Thomas Hofmann, Federico Tombari, Pinar Yanardag
MP-SfM: Monocular Surface Priors for Robust Structure-from-Motion
Zador Pataki, Paul-Edouard Sarlin, Johannes L. Schönberger, Marc Pollefeys
OFER: Occluded Face Expression Reconstruction
Pratheba Selvaraju, Victoria Abrevaya, Timo Bolkart, Rick Akkerman, Tianyu Ding, Faezeh Amjadi, Ilya Zharkov
Omnia de EgoTempo: Benchmarking Temporal Understanding of Multi-Modal LLMs in Egocentric Videos
Chiara Plizzari*, Alessio Tonioni, Yongqin Xian, Achin Kulshrestha, Federico Tombari
One2Any: One-Reference 6D Pose Estimation for Any Object
Mengya Liu, Siyuan Li, Ajad Chhatkuli, Prune Truong, Luc Van Gool, Federico Tombari
Perception Tokens Enhance Visual Reasoning in Multimodal Language Models
Mahtab Bigverdi, Zelun Luo, Cheng-Yu Hsieh, Ethan Shen, Dongping Chen, Linda G. Shapiro, Ranjay Krishna
Perturb-and-Revise: Flexible 3D Editing with Generative Trajectories Susung Hong, Johanna Karras, Ricardo Martin-Brualla, Ira Kemelmacher-Shlizerman
Poly-Autoregressive Prediction for Modeling Interactions
Neerja Thakkar, Tara Sadjadpour, Jathushan Rajasegeran, Shiry Ginosar, Jitendra Malik
Pose Priors from Language Models
Sanjay Subramanian, Evonne Ng, Lea Müller, Dan Klein, Shiry Ginosar, Trevor Darrell
ReCapture: Generative Video Camera Controls for User-Provided Videos using Masked Video Fine-Tuning
David Junhao Zhang, Roni Paiss, Shiran Zada, Nikhil Karnad, David E. Jacobs, Yael Pritch, Inbar Mosseri, Mike Zheng Shou, Neal Wadhwa, Nataniel Ruiz
RelationField: Relate Anything in Radiance Fields
Sebastian Koch, Johanna Wald, Mirco Colosi, Narunas Vaskevicius, Pedro Hermosilla, Federico Tombari, Timo Ropinski
Scaling Inference Time Compute for Diffusion Models
Nanye Ma*, Shangyuan Tong*, Haolin Jia, Hexiang Hu, Yu-Chuan Su, Mingda Zhang, Xuan Yang, Yandong Li, Tommi Jaakkola, Xuhui Jia, Saining Xie
SceneCrafter: Controllable Multi-View Driving Scene Editing
Zehao Zhu, Yuliang Zou, Chiyu Max Jiang, Bo Sun, Vincent Casser, Xiukun Huang, Jiahao Wang, Zhenpei Yang, Ruiqi Gao, Leonidas Guibas, Mingxing Tan, Dragomir Anguelov
Semantic Library Adaptation: LoRA Retrieval and Fusion for Open-Vocabulary Semantic Segmentation
Reza Qorbani, Gianluca Villani, Theodoros Panagiotakopoulos, Marc Botet Colomer, Linus Härenstam-Nielsen, Mattia Segu, Pier Luigi Dovesi, Jussi Karlgren, Daniel Cremers, Federico Tombari, Matteo Poggi
Simpler Diffusion (SiD2): 1.5 FID on ImageNet512 with Pixel-Space Diffusion
Emiel Hoogeboom, Thomas Mensink, Jonathan Heek, Kay Lamerigts, Ruiqi Gao, Tim Salimans
SimVS: Simulating World Inconsistencies for Robust View Synthesis
Alex Trevithick, Roni Paiss, Philipp Henzler, Dor Verbin, Rundi Wu, Hadi Alzayer, Ruiqi Gao, Ben Poole, Jonathan T. Barron, Aleksander Hołyński, Ravi Ramamoorthi, Pratul P. Srinivasan
Synthetic Prior for Few-Shot Drivable Head Avatar Inversion
Wojciech Zielonka*, Stephan J. Garbin, Alexandros Lattas, George Kopanas, Paulo Gotardo, Thabo Beeler, Justus Thies, Timo Bolkart
Test-Time Visual In-Context Tuning
Jiahao Xie, Alessio Tonioni, Nathalie Rauschmayr, Federico Tombari, Bernt Schiele
The Power of Context: How Multimodality Improves Image Super-Resolution
Kangfu Mei*, Hossein Talebi, Mojtaba Sahraee-Ardakan, Vishal M. Patel, Peyman Milanfar, Mauricio Delbracio
Token Cropr: Faster ViTs for Quite a Few Tasks
Benjamin Bergner, Christoph Lippert, Aravindh Mahendran
Towards a Universal Synthetic Video Detector: From Face or Background Manipulations to Fully AI-Generated Content
Rohit Kundu, Hao Xiong, Vishal Mohanty, Athula Balachandran, Amit K. Roy-Chowdhury
Tuning the Frequencies: Robust Training for Sinusoidal Neural Networks
Tiago Novello, Diana Aldana, Andre Araujo, Luiz Velho
UniRestore: Unified Perceptual and Task-Oriented Image Restoration Model Using Diffusion Prior
I-Hsiang Chen, Wei-Ting Chen, Yu-Wei Liu, Yuan-Chun Chiang, Sy-Yen Kuo, Ming-Hsuan Yang
UNOPose: Unseen Object Pose Estimation with an Unposed RGB-D Reference Image
Xingyu Liu, Gu Wang, Ruida Zhang, Chenyangguang Zhang, Federico Tombari, Xiangyang Ji
VideoComp: Advancing Fine-Grained Compositional Alignment in Video-Text Models
Dahun Kim, AJ Piergiovanni, Ganesh Satish Mallya, Anelia Angelova
Vision-Language Models Do Not Understand Negation
Kumail Alhamoud, Shaden Alshammari, Yonglong Tian, Guohao Li, Philip H.S. Torr, Yoon Kim, Marzyeh Ghassemi
VLOGGER: Multimodal Diffusion for Embodied Avatar Synthesis
Enric Corona, Andrei Zanfir, Eduard Gabriel Bazavan, Nikos Kolotouros, Thiemo Alldieck, Cristian Sminchisescu
Zero-Shot Styled Text Image Generation, but Make It Autoregressive
Vittorio Pippi, Fabio Quattrini, Silvia Cascianelli, Alessio Tonioni, Rita Cucchiara
Workshops
-
Wed, Jun 11 | 8:45AM — 12:45PM, 106C
3D Scene Understanding for Vision, Graphics, and RoboticsOrganizers: Songyou Peng
-
Wed, Jun 11 | 9:00AM — 5:00PM, 101A
3D Vision Language Model for Robotics Manipulation: Opportunities and ChallengesSpeakers: Ranjay Krishna
-
Wed, Jun 11 | 9:20AM — 5:05PM, 104B
4D Vision: Modeling the Dynamic WorldSpeakers: Tali Dekel
Organizers: Aleksander Hołyński, Carl Doersch
-
Wed, Jun 11 | 9:00AM — 12:40PM, 202C
BEAM 2025: Benchmarking and Expanding AI Multimodal ApproachesSpeakers: Andre Araujo
-
Wed, Jun 11 | 8:00AM — 12:00PM, 109
Computer Vision for Mixed RealityOrganizers: Andrea Colaco
-
Wed, Jun 11 | 9:15AM — 5:10PM, 101B
Computer Vision in the WildSpeakers: Boqing Gong, Cordelia Schmid, Ranjay Krishna, Saining Xie
Organizers: Mu Cai
-
Wed, Jun 11 | 9:15AM — 5:00PM, 205C
CV4Science 2025: Using Computer Vision for the SciencesSpeakers: Adji Bousso Dieng
-
Wed, Jun 11 | 9:00AM — 5:30PM, 213
Demographic Diversity in Computer VisionSpeakers: Aishwarya Agrawal, Ranjay Krishna
-
Wed, Jun 11 | 1:00PM — 5:45PM, 210
Emergent Visual Abilities and Limits of Foundation Models (EVAL-FoMo 2)Speakers: Saining Xie, Aida Nematzadeh, Ranjay Krishna
Organizers: Jindong Gu, Arsha Nagrani
-
Wed, Jun 11 | 8:30AM — 5:00PM, 101C
Equivariant Vision: From Theory to PracticeOrganizers: Leonidas Guibas
-
Wed, Jun 11 | 1:00PM — 5:00PM, 107B
Explainable AI for Computer Vision (XAI4CV)Speakers: Klaus-Robert Müller, Junfeng He
-
Wed, Jun 11 | 9:00AM — 5:15PM, 104E
FGVC12: Fine-Grained Visual CategorizationSpeakers: Kenneth Marino
Organizers: Christine Kaeser-Chen
-
Wed, Jun 11 | 9:00AM — 5:15PM, 214
Foundation Models Meet Embodied AgentsSpeakers: Yilun Du, Ranjay Krishna
-
Wed, Jun 11 | 9:15AM — 12:00PM, 110A
GenAI Media Generation ChallengeSpeakers: Saining Xie
-
Wed, Jun 11 | 9:00AM — 12:30PM, 110B
Generalization in Robotics Manipulation Workshop and ChallengesSpeakers: Ranjay Krishna
Organizers: Cordelia Schmid
-
Wed, Jun 11 | 12:50PM — 6:15PM, 209A-C
How to Stand Out in the Crowd?Speakers: Dima Damen, Saining Xie
-
Wed, Jun 11 | 8:55AM — 5:30PM, 101D
Humanoid AgentsOrganizers: Leonidas Guibas
-
Wed, Jun 11 | 1:00PM — 5:30PM, 108
Image Matching: Local Features and BeyondSpeakers: Jonáš Kulhánek
Organizers: Eduard Trulls
-
Wed, Jun 11 | 8:50AM — 12:00PM, Davidson C2
Large Scale Holistic Video UnderstandingSpeakers: Dima Damen
Organizers: Anurag Arnab, Shyamal Buch, David Ross, João Carreira
-
Wed, Jun 11 | 9:00AM — 12:00PM, 208A
M&M: Multi-Modal Models and MedicineSpeakers: Vivek Natarajan
-
Wed, Jun 11 | 1:40PM — 6:00PM, 207A-D
Multimodal Algorithmic ReasoningSpeakers: Cordelia Schmid
-
Wed, Jun 11 | 1:30PM — 5:30PM, 107A
Multimodal Foundation Models for Biomedicine: Challenges and OpportunitiesSpeakers: Vivek Natarajan
-
Wed, Jun 11 | 8:00AM — 5:00PM, 101E
Navigating the Future: Ensuring Trustworthiness in Multi-Modal Open-World IntelligenceSpeakers: Ming-Hsuan Yang
-
Wed, Jun 11 | 9:00AM — 7:00PM, Davidson C3
New Trends in Image Restoration and Enhancement (NTIRE)Organizers: Ming-Hsuan Yang
-
Wed, Jun 11 | 1:20PM — 5:30PM, 110A
Photo-Realistic 3D Head Avatars (P3HA)Speakers: Stefanos Zafeiriou
-
Wed, Jun 11 | 1:45PM — 5:25PM, Davidson C1
Real-to-Sim: Bridging the Gap between Neural Rendering and Robot LearningSpeakers: Dhruv Shah
-
Wed, Jun 11 | 9:00AM — 12:30PM, 211
Sight and SoundSpeakers: David Harwath
Organizers: Arsha Nagrani, William Freeman, Andrew Zisserman
-
Wed, Jun 11 | 9:00AM — 5:20PM, Grand C2
Synthetic Data for Computer VisionOrganizers: Ranjay Krishna
-
Wed, Jun 11 | 8:00AM — 12:25PM, 110A
Three Things Everyone Should Ask About Photorealistic Virtual Try-OnSpeakers: Ira Kemelmacher-Shlizerman
-
Wed, Jun 11 | 8:50AM — 5:20PM, 102B
Uncertainty Quantification for Computer VisionSpeakers: Jindong Gu
-
Wed, Jun 11 | 9:00AM — 6:00PM, 104D
Urban Scene Modeling: Where Vision Meets Photogrammetry and Graphics (USM3D)Speakers: Federico Tombari
-
Wed, Jun 11 | 8:30AM — 5:00PM, Grand A1
Video Large Language ModelsSpeakers: Cordelia Schmid
-
Wed, Jun 11 | 8:30AM — 5:40PM, 104C
Visual Perception and Learning in an Open WorldOrganizers: Yunhan Zhao
-
Thu, Jun 12 | 9:00AM — 5:05PM, 102B
3D Digital Twin: Progress, Challenges, and Future DirectionsOrganizers: Leonidas Guibas
-
Thu, Jun 12 | 8:00AM — 12:30PM, 106A
3D-LLM/VLA: Bridging Language, Vision and Action in 3D EnvironmentsSpeakers: Yilun Du
-
Thu, Jun 12 | 8:30AM — 12:30PM, 205A
Adversarial Machine Learning on Computer Vision: Foundation Models + XOrganizers: Xinyun Chen
-
Thu, Jun 12 | 1:00PM — 6:00PM, 202A
Affective & Behavior Analysis In-the-WildOrganizers: Stefanos Zafeiriou
-
Thu, Jun 12 | 9:25AM — 4:50PM, 213
Agent in Interaction, from Humans to RobotsSpeakers: Neerja Thakkar
-
Thu, Jun 12 | 8:45AM — 6:00PM, Grand A1
AI for Content Creation (AI4CC)Speakers: Charles Herrmann
Organizers: Deqing Sun
-
Thur, Jun 12 | 12:45PM — 5:40PM, 207A-D
AI for Creative Visual Content Generation, Editing and UnderstandingSpeakers: Nataniel Ruiz, Du Tran, Ming-Hsuan Yang
Organizers: Ruihan Zhang
-
Thur, Jun 12 | 1:00PM — 5:00PM, 208A
Efficient and On-Device Generation (EDGE)Organizers: Tingbo Hou, Yang Zhao, Zhisheng Xiao, Qifei Wang, Ruiqi Gao, Haolin Jia
-
Thur, Jun 12 | 8:45AM — 5:45PM, Grand B1
Egocentric Vision (EgoVis)Speakers: Arsha Nagrani
Advisor: Dima Damen
-
Thur, Jun 12 | 9:00AM — 5:00PM, 101B
Embodied "Humans": Symbiotic Intelligence between Virtual Humans and Humanoid RobotsSpeakers: Leonidas Guibas
-
Thu, Jun 12 | 1:05PM — 5:25PM, Davidson C3
Enforcing Geometric, Physical, Topological, and Functional Inductive Bias in 3D GenerationSpeakers: Leonidas Guibas, Maks Ovsjanikov
-
Thu, Jun 12 | 8:00AM — 6:00PM, Grand C2
Event-Based VisionSpeakers: Priya Panda
-
Thur, Jun 12 | 1:45PM — 4:45PM, 208B
Experimental Model Auditing via Controlled Synthesis (EMACS)Speakers: Negar Rostamzadeh
-
Thu, Jun 12 | 8:50AM — 12:30PM, 105A
LOVE: Multimodal Video AgentSpeakers: Sherry Yang
-
Thur, Jun 12 | 1:45PM — 6:00PM, 105A
Open-World 3D Scene Understanding with Foundation ModelsOrganizers: Johanna Wald, Federico Tombari, Leonidas Guibas
-
Thur, Jun 12 | 2:00PM — 6:00PM, 210
Perception for Industrial Robotics AutomationOrganizers: Krzysztof Choromanski, Martin Sundermeyer
-
Thur, Jun 12 | 1:45PM — 5:30PM, 106C
Physics-Inspired 3D Vision and ImagingOrganizers: Dor Verbin
-
Thu, Jun 12 | 8:30AM — 5:30PM, 101E
PixFoundation: Workshop on Pixel-level Vision Foundation ModelsSpeakers: Cordelia Schmid
-
Thu, Jun 12 | 1:30PM — 5:30PM, 107A
Precognition: Seeing through the FutureOrganizers: Utsav Prabhu
-
Thur, Jun 12 | 1:30PM — 5:30PM, 106A
ReGenAI: Responsible Generative AIPanelists: Aishwarya Agrawal
Organizers: Negar Rostamzadeh, Utsav Prabhu
-
Thu, Jun 12 | 8:55AM — 12:30PM, 212
Rhobin2025: The Third Rhobin Challenge on Reconstruction of Human-Object InteractionOrganizers: Thiemo Alldieck
-
Thu, Jun 12 | 8:50AM — 12:30PM, 211
ScanNet++ Novel View Synthesis and 3D Semantic Understanding ChallengeSpeakers: Cordelia Schmid
-
Thu, Jun 12 | 8:45AM — 12:30PM, Davidson C2
Sign Language Recognition, Translation and ProductionOrganizers: Liliane Momeni
-
Thur, Jun 12 | 9:00AM — 12:30PM, 109
Test-Time Scaling for Computer VisionSpeakers: Saining Xie
Organizers: Jindong Gu
-
Thu, Jun 12 | 1:20PM — 6:00PM, 209A-C
Transformers for VisionSpeakers: Wenhu Chen
-
Thur, Jun 12 | 8:30AM — 12:30PM, Davidson C1
VAND: Visual Anomaly and Novelty DetectionOrganizers: Yedid Hoshen
-
Thur, Jun 12 | 9:00AM — 6:00PM, 104E
Vision Language Models For All: Building Geo-Diverse and Culturally Aware Vision-Language ModelsSpeakers: Roopal Garg, Negar Rostamzadeh
Organizers: Sjoerd van Steenkiste, Aishwarya Agrawal
-
Thu, Jun 12 | 9:00AM — 4:35PM, 101A
Visual ConceptsSpeakers: Chen Sun
-
Thu, Jun 12 | 9:00AM — 5:00PM, 103A
Visual Generative Modeling: What’s After Diffusion?Speakers: Bill Freeman
Organizers: Yilun Xu
-
Thur, Jun 12 | 1:00PM — 5:00PM, 105B
Visual Modeling Challenges for 2D-3D Virtual Try-OnSpeakers: Ira Kemelmacher-Shlizerman
-
Thur, Jun 12 | 1:00PM — 5:30PM, Davidson C2
VizWiz Grand ChallengeSpeakers: Amy Pavel
-
Thu, Jun 12 | 8:30AM — 12:15PM, 207A-D
What is Next in Multimodal Foundation Models?Speakers: Arsha Nagrani
Panelists: Arsha Nagrani
Program Chairs: Sivan Doveh
-
Thu, Jun 12 | 8:30AM — 12:05PM, 108
WorldModelBench: Benchmarking World Foundation ModelsSpeakers: Wenhu Chen
Organizers: Wenhu Chen
Board & Organizing Committee
-
Cristian Sminchisescu
- Organizing Committee
-
Forrester Cole
- Workshop Chair
-
Chen Sun
- Workshop Chair
-
Neal Wadhwa
- Demonstration Chair
-
Saining Xie
- Broadening Participation Chair