Google at CVPR 2026

Google Booth Activities

Join us at the Google booth, #557, for live demos and Q&A's (times are subject to change).

Fri, Jun 5 | 11:00AM — 11:30AM
Vision Banana: Image Generators are Generalist Vision Learners

A unified model that treats visual perception as an image generation task via text-guided instruction tuning. It achieves state-of-the-art performance on a diverse suite of 2D and 3D visual understanding benchmarks, showing that generative pre-training provides a powerful foundation for computer vision tasks. Bring your own image!
Presenter: Songyou Peng
Fri, Jun 5 | 12:00PM — 12:30PM
Proactive Multimodal Agents in Intelligent Eyewear

This demo features an AI agent that connects physical environments to digital tasks by interpreting first-person video from smart glasses. It demonstrates how egocentric context, like recognizing objects, can trigger automated app or browser actions such as e-commerce, navigation, or media retrieval.
Presenters: Meiqi Guo, Lei Shu, Shoubin Yu, Boqing Gong
Fri, Jun 5 | 1:00PM — 1:30PM
Learning from Single-Life Videos - Can we train on the experiences of only a single individual?

How can models learn from continuous, first-person video? Join us to see the results of training on individual day-to-day experiences.
Presenters: Dima Damen, Sayna Ebrahimi, Tengda Han
Fri, Jun 5 | 4:00PM — 4:30PM & Sat, Jun 6 | 12:00PM — 12:30PM
Discover Android XR

Experience a live demonstration of the latest Android XR features and experiences, including cutting-edge computer vision and spatial intelligence running natively on the new Android XR platform. We are showcasing how to use XR glasses as a portable private display, Gemini on AndroidXR, auto-spatialization of 2D content and more.
Presenters: Federico Tombari, Lukas Hoyer, Ivana Tosic Rodgers, Swati Jindal, Fabian Manhardt, Mario Malave, Shijie Zhou
Sat, Jun 6 | 1:00PM — 1:30PM
TIPSv2: Advancing Vision-Language Pretraining with Enhanced Patch-Text Alignment

A foundational image-text encoder with spatial awareness, leading to strong results for vision and multimodal applications.
Presenters: Andre Araujo, Erik de Godoy, Gabriele Berton, Washington Ramos
Sat, Jun 6 | 4:30PM — 5:00PM
BlazeEdit: Generalist Image Editing on Mobile Devices with Image-to-Image Diffusion Models

A compact, 195M-parameter image-to-image diffusion model optimized for on-device use.
By removing text-conditioning, this multi-task architecture enables object removal, outpainting, and relighting in just 290ms on a Pixel 10, offering a fast, private, and efficient editing experience on the edge.
Presenters: Fei Deng, Yanwu Xu, Zhipeng Bao, Karthik Raveendran
Sat, Jun 6 | 5:30PM — 6:00PM
Project Astra 3D

The Project Astra 3D team presents 3DCodeBench, a benchmark designed to demonstrate the proficiency of Gemini models in generating diverse 3D objects through code execution. This work illustrates a future where Gemini models autonomously interface with software to assist artists in the automated creation of 3D assets.
Presenters: Lei Shu, Yipeng Gao

Google Booth Interactive Kiosks

Join us at the Google booth, #557, for live demos and Q&A's (times are subject to change).

Kiosk 1: Fri, Jun 5 | 12:00PM — 1:00PM & Kiosk 1: Sat, Jun 6 | 12:00PM — 1:00PM
BlazeEdit: Generalist Image Editing on Mobile Devices with Image-to-Image Diffusion Models

A compact, 195M-parameter image-to-image diffusion model optimized for on-device use.
By removing text-conditioning, this multi-task architecture enables object removal, outpainting, and relighting in just 290ms on a Pixel 10, offering a fast, private, and efficient editing experience on the edge.
Presenters: Fei Deng, Yanwu Xu, Zhipeng Bao, Karthik Raveendran
Kiosk 2: Fri, Jun 5 | 12:00PM — 1:00PM
Efficiently Reconstructing Dynamic Scenes One 🎯 D4RT at a Time

Presenters: Skanda Koppula, Mehdi S. M. Sajjadi
Kiosk 1: Fri, Jun 5 | 4:00PM — 5:00PM
Vision Banana: Image Generators are Generalist Vision Learners

A unified model that treats visual perception as an image generation task via text-guided instruction tuning. It achieves state-of-the-art performance on a diverse suite of 2D and 3D visual understanding benchmarks, showing that generative pre-training provides a powerful foundation for computer vision tasks. Bring your own image!
Presenter: Songyou Peng
Kiosk 2: Fri, Jun 5 | 4:00PM — 5:00PM
TIPSv2: Advancing Vision-Language Pretraining with Enhanced Patch-Text Alignment

A foundational image-text encoder with spatial awareness, leading to strong results for vision and multimodal applications.
Presenters: Andre Araujo, Erik de Godoy, Gabriele Berton
Kiosk 2: Sat, Jun 6 | 12:00PM — 1:00PM
Project Genie

Experimenting with infinite, interactive worlds.
Presenter: Hang Qi
Kiosk 1: Sat, Jun 6 | 4:30PM — 5:30PM & Kiosk 1: Sun, Jun 7 | 12:00PM — 1:00PM
Discover Android XR

Experience a live demonstration of the latest Android XR features and experiences, including cutting-edge computer vision and spatial intelligence running natively on the new Android XR platform. We are showcasing how to use XR glasses as a portable private display, Gemini on AndroidXR, auto-spatialization of 2D content and more.
Presenters: Federico Tombari, Lukas Hoyer, Ivana Tosic Rodgers, Swati Jindal, Fabian Manhardt, Mario Malave, Shijie Zhou

Award Candidates

Affinity Workshops

Orals

Highlights

Fri, Jun 5 | 10:45AM — 12:45PM, Exhibit Hall A & F (Poster Session 1, #443)
Agile Deliberation: Concept Deliberation for Subjective Visual Classification
Leijie Wang*, Otilia Stretcu, Wei Qiao, Thomas Denby, Krishnamurthy Viswanathan, Enming Luo, Chun-Ta Lu, Tushar Dogra, Ranjay Krishna, Ariel Fuxman

Fri, Jun 5 | 10:45AM — 12:45PM, Exhibit Hall A & F (Poster Session 1, #152)
Differences That Matter: Auditing Models for Capability Gap Discovery and Rectification
Qihao Liu*, Chengzhi Mao, Yaojie Liu, Alan Yuille, Wen-Sheng Chu

Fri, Jun 5 | 4:00PM — 6:00PM, Exhibit Hall A & F (Poster Session 2, #505)
OVI-MAP: Open-Vocabulary Instance-Semantic Mapping
Zilong Deng, Federico Tombari, Marc Pollefeys, Johanna Wald, Daniel Barath

Fri, Jun 5 | 4:00PM — 6:00PM, Exhibit Hall A & F (Poster Session 2, #102)
Radiance Meshes for Volumetric Reconstruction
Alexander Mai, Trevor Hedstrom, George Kopanas, Janne Kontkanen, Falko Kuester, Jonathan T. Barron

Fri, Jun 5 | 4:00PM — 6:00PM, Exhibit Hall A & F (Poster Session 2, #618)
Representing 3D Faces with Learnable B-Spline Volumes
Prashanth Chandran, Daoye Wang, Timo Bolkart

Sat, Jun 6 | 4:45 PM — 6:45PM, Exhibit Hall A & F (Poster Session 4, #119)
MDS-VQA: Model-Informed Data Selection for Video Quality Assessment
Jian Zou, Xiaoyu Xu, Zhihua Wang, Yilin Wang, Balu Adsumilli, Kede Ma

Sun, Jun 7 | 11:45AM — 1:45PM, Exhibit Hall F (Poster Session 5, #396)
CURVE: A Benchmark for Cultural and Multilingual Long Video Reasoning
Darshan Singh, Arsha Nagrani, Kawshik Manikantan, Harman Singh, Dinesh Tewari, Tobias Weyand, Cordelia Schmid, Anelia Angelova, Shachi Dave

Sun, Jun 7 | 11:45AM — 1:45PM, Exhibit Hall F (Poster Session 5, #379)
Elastic3D: Controllable Stereo Video Conversion with Guided Latent Decoding
Nando Metzger*, Prune Truong, Goutam Bhat, Konrad Schindler, Federico Tombari

Sun, Jun 7 | 11:45AM — 1:45PM, Exhibit Hall F (Poster Session 5, #501)
Learning Latent Transmission and Glare Maps for Lens Veiling Glare Removal
Xiaolong Qian, Qi Jiang, Lei Sun, Zongxi Yu, Kailun Yang, Peixuan Wu, Jiacheng Zhou, Yao Gao, Yaoguang Ma, Ming-Hsuan Yang, Kaiwei Wang

Sun, Jun 7 | 3:30PM — 5:30 PM, Exhibit Hall A (Poster Session 6, #659)
Image Diffusion Preview with Consistency Solver
Fu-Yun Wang*, Hao Zhou, Liangzhe Yuan, Sanghyun Woo, Boqing Gong, Bohyung Han, Ming-Hsuan Yang, Han Zhang, Yukun Zhu, Ting Liu, Long Zhao

Sun, Jun 7 | 3:30PM — 5:30 PM, Exhibit Hall A (Poster Session 6, #63)
A Mixed Diet Makes DINO an Omnivorous Vision Encoder
Rishabh Kabra, Maks Ovsjanikov, Drew A. Hudson, Ye Xia, Skanda Koppula, Andre Araujo, João Carreira, Niloy J. Mitra

Sun, Jun 7 | 3:30PM — 5:30 PM, Exhibit Hall A (Poster Session 6, #651)
Visual Diffusion Models are Geometric Solvers
Nir Goren, Shai Yehezkel, Omer Dahary, Andrey Voynov, Or Patashnik, Daniel Cohen-Or

Accepted Papers

Archon: A Unified Multimodal Model for Holistic Digital Human Generation
Chong Bao*, Shichen Liu, Lijun Yu, David Futschik, Stylianos Moschoglou, Shefali Srivastava, Ziqian Bai, Feitong Tan, Guofeng Zhang, Zhaopeng Cui, Sean Fanello, Yinda Zhang

Beyond Objects: Contextual Synthetic Data Generation for Fine-Grained Classification
William Yang, Xindi Wu, Zhiwei Deng, Esin Tureci, Olga Russakovsky

CaptionFormer: Unified Segmentation, Tracking, and Captioning for Spatio-Temporal Objects
Gabriel Fiastre, Antoine Yang, Cordelia Schmid

CRIT: Graph-Based Automatic Data Synthesis to Enhance Cross-Modal Multi-Hop Reasoning
Junyoung Sung, Seungwoo Lyu, Minjun Kim, Sumin An, Arsha Nagrani, Paul Hongsuck Seo

Designing Instance-Level Sampling Schedules via REINFORCE with James-Stein Shrinkage
Peiyu Yu*, Suraj Kothawade, Sirui Xie, Ying Nian Wu, Hongliang Fei

Efficiently Reconstructing Dynamic Scenes One D4RT at a Time
Chuhan Zhang, Guillaume Le Moing, Skanda Koppula, Ignacio Rocco, Liliane Momeni, Junyu Xie*, Shuyang Sun, Rahul Sukthankar, Joëlle K. Barral, Raia Hadsell, Zoubin Ghahramani, Andrew Zisserman, Junlin Zhang, Mehdi S. M. Sajjadi

Ego2Web: A Web Agent Benchmark Grounded in Egocentric Videos
Shoubin Yu*, Lei Shu, Antoine Yang, Yao Fu, Srinivas Sunkara, Maria Wang, Jindong Chen, Mohit Bansal, Boqing Gong

ESAM++: Efficient Online 3D Perception on the Edge
Qin Liu, Lavisha Aggarwal, Saptarashmi Bandyopadhyay, Vikas Bahirwani, Marc Niethammer, Ehsan Adeli, Andrea Colaco

Eulerian Gaussian Splatting using Hashed Probability Pyramids
Mia Gaia Polansky, George Kopanas, Stephan Garbin, Todd Zickler, Dor Verbin

Feed-forward Gaussian Registration for Head Avatar Creation and Editing
Malte Prinzler*, Paulo Gotardo, Siyu Tang, Timo Bolkart

Gaze Target Estimation Anywhere with Concepts
Xu Cao, Houze Yang, Vipin Gunda, Zhongyi Zhou, Tianyu Xu, Adarsh Kowdle, Inki Kim, Jim M. Rehg

GUIDE: A Benchmark for Understanding and Assisting Users in Open-Ended GUI Tasks
Saelyne Yang, Jaesang Yu, Yi-Hao Peng, Kevin Qinghong Lin, Jae Won Cho, Yale Song, Juho Kim

Minerva-Ego: Spatiotemporal Hints for Egocentric Video Understanding
Arsha Nagrani, Jasper Uijlings, Shyamal Buch, Tobias Weyand, Sudheendra Vijayanarasimhan, Bo Hu, Ramin Mehran, David A. Ross, Cordelia Schmid

Mining Attribute Subspaces for Efficient Fine-tuning of 3D Foundation Models
Yu Jiang, Hanwen Jiang, Ahmed Abdelkader, Wen-Sheng Chu, Brandon Y. Feng, Zhangyang Wang, Qixing Huang

Mobile-VTON: High-Fidelity On-Device Virtual Try-On
Zhenchen Wan, Ce Chen, Runqi Lin, Jiaxin Huang, Tianxi Chen, Yanwu Xu, Tongliang Liu, Mingming Gong

MOSAIC-GS: Monocular Scene Reconstruction via Advanced Initialization for Complex Dynamic Environments
Svitlana Morkva, Maximum Wilder-Smith, Michael Oechsle, Alessio Tonioni, Marco Hutter, Vaishakh Patil

MotionV2V: Editing Motion in a Video
Ryan Burgert, Charles Herrmann, Forrester Cole, Michael S. Ryoo, Neal Wadhwa, Andrey Voynov, Nataniel Ruiz

ORBIT: Benchmarking SfM in the Wild with 360° Video
Sara Sabour, Richard Tucker, Marcus Brubaker, Saurabh Saxena, Junhwa Hur, Andrea Tagliasacchi, Deqing Sun, David J. Fleet, Richard Szeliski, Noah Snavely

Physical Simulator In-the-Loop Video Generation
Lin Geng Foo, Mark He Huang, Alexandros Lattas, Stylianos Moschoglou, Thabo Beeler, Christian Theobalt

POGA: Paraphrased and Oppositional Graph Alignment for Fine-Grained Cross-Modal Retrieval
Junfeng Zhang, Zhe Xue, Yuankai Qi, Junping Du, Xiangyang Kong, Yishuo Yan, Amin Beheshti, Jian Yang, Anton van den Hengel, Ming-Hsuan Yang

PyraTok: Language-Aligned Pyramidal Tokenizer for Video Understanding and Generation
Onkar Susladkar, Tushar Prakash, Adheesh Juvekar, Kiet A. Nguyen, Dong-Hwan Jang, Inderjit S. Dhillon, Ismini Lourentzou

Recurrent Video Masked Autoencoders
Daniel Zoran, Nikhil Parthasarathy, Yi Yang, Drew A. Hudson, João Carreira, Andrew Zisserman

Robust Promptable Video Object Segmentation
Sohyun Lee, Yeho Gwon, Lukas Hoyer, Konrad Schindler, Christos Sakaridis, Suha Kwak

SAGA: Source Attribution of Generative AI Videos
Rohit Kundu, Vishal Mohanty, Hao Xiong, Shan Jia, Athula Balachandran, Amit K. Roy-Chowdhury

Seeing Beyond 8bits: Subjective and Objective Quality Assessment of HDR-UGC Videos
Shreshth Saini, Bowen Chen, Neil Birkbeck, Yilin Wang, Balu Adsumilli, Alan C. Bovik

Seeing without Pixels: Perception from Camera Trajectories
Zihui Xue*, Kristen Grauman, Dima Damen, Andrew Zisserman, Tengda Han

Sensor2Sensor: Cross-Embodiment Sensor Conversion for Autonomous Driving
Jiahao Wang, Bo Sun, Yijing Bai, Vincent Casser, Songyou Peng, Zehao Zhu, Meng-Li Shih, Xander Masotto, Shih-Yang Su, Kanaad Parvate, Tiancheng Ge, Linn Bieske, Dragomir Anguelov, Mingxing Tan, Chiyu Max Jiang

SpatialStack: Layered Geometry-Language Fusion for 3D VLM Spatial Reasoning
Jian Zhang, Shijie Zhou, Bangya Liu, Achuta Kadambi, Zhiwen Fan

Spherical Voronoi: Directional Appearance as a Differentiable Partition of the Sphere
Francesco Di Sario, Daniel Rebain, Dor Verbin, Marco Grangetto, Andrea Tagliasacchi

Talking Together: Synthesizing Co-Located 3D Conversations from Audio
Mengyi Shan, Shouchieh Chang, Ziqian Bai, Shichen Liu, Yinda Zhang, Luchuan Song, Rohit Pandey, Sean Fanello, Zeng Huang

TIPSv2: Advancing Vision-Language Pretraining with Enhanced Patch-Text Alignment
Bingyi Cao, Koert Chen, Kevis-Kokitsi Maninis, Kaifeng Chen*, Arjun Karpur*, Ye Xia, Sahil Dua, Tanmaya Dabral, Guangxing Han, Bohyung Han*, Joshua Ainslie, Alex Bewley, Mithun Jacob, René Wagner, Washington Ramos, Krzysztof Choromanski, Mojtaba Seyedhosseini, Howard Zhou, André Araujo

Understanding, Accelerating, and Improving MeanFlow Training
Jin-Young Kim, Hyojun Go, Lea Bogensperger, Julius Erbach, Nikolai Kalischek, Federico Tombari, Konrad Schindler, Dominik Narnhofer

Unique Lives, Shared World: Learning From Single-Life Videos
Tengda Han, Sayna Ebrahimi, Dilara Gokay, Li Yang Ku, Maks Ovsjanikov, Iva Babukova, Daniel Zoran, Viorica Pătraucean, João Carreira, Andrew Zisserman, Dima Damen

VISTA: A Test-Time Self-Improving Video Generation Agent
Do Xuan Long*, Xingchen Wan, Hootan Nakhost, Chen-Yu Lee, Tomas Pfister, Sercan Ö. Arık

VLIC: Vision-Language Models as Perceptual Judges for Human-Aligned Image Compression
Kyle Sargent, Ruiqi Gao, Philipp Henzler, Charles Herrmann, Aleksander Hołyński, Li Fei-Fei, Jiajun Wu, Jason Zhang

VULCAN: Tool-Augmented Multi Agents for Iterative 3D Object Arrangement
Zhengfei Kuang*, Rui Lin, Long Zhao, Gordon Wetzstein, Saining Xie, Sanghyun Woo

Watch and Learn: Learning to Use Computers from Online Videos
Chan Hee Song*, Yiwen Song, Palash Goyal, Yu Su, Oriana Riva, Hamid Palangi, Tomas Pfister

WaTeRFlow: Watermark Temporal Robustness via Flow Consistency
Utae Jeong, Sumin In, Hyunju Ryu, Jaewan Choi, Feng Yang, Jongheon Jeong, Seungryong Kim, Sangpil Kim

What Are You Doing? A Closer Look at Controllable Human Video Generation
Emanuele Bugliarello, Anurag Arnab, Roni Paiss, Pieter-Jan Kindermans, Cordelia Schmid

ZipMap: Linear-Time 3D Reconstruction via Test-Time Training
Haian Jin, Rundi Wu, Tianyuan Zhang, Ruiqi Gao, Jonathan T. Barron, Noah Snavely, Aleksander Hołyński

Findings Track

Tutorials

Demonstrations

Workshops

Wednesday, June 3

Wed, Jun 3 | 7:30AM — 12:30PM, Room 506
GRAIL-V: Grounded Retrieval & Agentic Intelligence for Vision-Language
Panelist: Ming-Hsuan Yang

Wed, Jun 3 | 8:00AM — 12:55PM, Room 111
Generative AI for XR and Identity-based Applications
Speaker: Karan Ahuja

Wed, Jun 3 | 8:10AM — 12:35PM, Room 601
Multimodal Spatial Intelligence
Organizers: Phillip Y. Lee, Songyou Peng, Leonidas Guibas

Wed, Jun 3 | 8:20AM — 12:30PM, Room 113
Multimodal Alignment for a Pluralistic Society (MAPS)
Speaker: Lora Aroyo
Organizers: Negar Rostamzadeh, Aishwarya Agrawal

Wed, Jun 3 | 8:25AM — 1:00PM, Rooms 203
IPA: Interactive Physical AI
Speaker: Maja Matarić

Wed, Jun 3 | 8:30AM — 12:30PM, Room 607
Foundation Models for Medical Vision
Organizer: Yuyin Zhou

Wed, Jun 3 | 8:30AM — 12:30PM, Room 607
Generative AI for Sign Language
Organizer: Stefanos Zafeiriou

Wed, Jun 3 | 8:30AM — 12:00PM, Room 705/707
Video World Models: Interaction, Memory, and Efficiency
Speakers: Jack Parker-Holder, Sherry Yang

Wed, Jun 3 | 8:30AM — 1:00PM, Room 102/104
Vision-based Assistants in the Real-World
Speaker: Michael Ryoo, Yao Qin

Wed, Jun 3 | 8:30AM — 4:50PM, Room 703
Visual General Intelligence
Speaker: Robert Geirhos

Wed, Jun 3 | 9:00AM — 6:00PM, Mile High 3B
Urban Scene Modeling: Structured, Semantic, and Synthetic 3D Habitats
Speaker: Daniel Barath

Wed, Jun 3 | 9:15AM — 4:00PM, Room 605
Medical Computer Vision
Speaker: Maddie Traverse

Wed, Jun 3 | 9:30AM — 12:00PM, Room 109
Subtle Visual Computing @CVPR 2026
Speaker: Xin Liu

Wed, Jun 3 | 1:00PM — 6:00PM, Mile High 1AB
Machine Unlearning for Vision
Organizer: Bernt Schiele

Wed, Jun 3 | 1:00PM — 6:00PM, Room 709
MetaFood (MTF)
Speaker: Dima Damen
Organizer: Jinheng Xie

Wed, Jun 3 | 1:00PM — 4:40PM, Room 105
Monitoring the World through an Imperfect Lens
Organizer: Bill Freeman

Wed, Jun 3 | 1:00PM — 6:00PM, Four Seasons 4
"What is Next in Multimodal Foundation Models?”
Organizer: Sivan Doveh

Wed, Jun 3 | 1:20PM — 5:30PM, Mile High 4AB
Rediscovering Intelligence: Can AI Still Learn from Humans?
Speaker: Dima Damen

Wed, Jun 3 | 1:25PM — 5:15PM, Room 506
Test-Time Scaling for Computer Vision
Organizer: Jindong Gu

Wed, Jun 3 | 1:30PM — 5:00PM, Mile High 4EF
Multi-Agent Robotic Systems: Scaling with Compositional Intelligence
Speaker: Dhruv Shah
Organizer: Fangchen Liu

Wed, Jun 3 | 1:30PM - 5:45PM, Room 705/707
Open-World 3D Scene Understanding with Foundation Models
Speaker: Aleksander Hołyński
Organizers: Johanna Wald, Federico Tombari, Leonidas J. Guibas

Wed, Jun 3 | 1:45PM — 5:30PM, Room 607
Transformers for Vision and Multimodal AI
Speaker: Sherry Yang

Thursday, June 4

Thu, Jun 4 | 7:50AM — 12:30PM, Room 704/706
Long-Form Video Understanding, Generation and Action
Speaker: Ruben Villegas

Thu, Jun 4 | 7:55AM — 12:45PM, Room 502
Any-to-Any Multimodal Learning
Organizer: Chenyu Wang

Thu, Jun 4 | 8:00AM — 12:10PM, Exhibit Hall A 106
Computer Vision for Children
Speaker: Dima Damen
Panelist: Boqing Gong
Organizer: Zhongyi Zhou
Advisory Board: Yinda Zhang

Thu, Jun 4 | 8:00AM — 12:30PM, Room 607
Geometry-Free Novel View Synthesis and Controllable Video Models
Speaker: Aleksander Hołyński
Organizer: Leonidas Guibas

Thu, Jun 4 | 8:00AM — 12:00PM, Room 704/706
Knowledge-Intensive Multimodal Reasoning
Organizer: Wenhao Chai

Thu, Jun 4 | 8:00AM — 1:00PM, Room 504
Low‑Level Vision Frontiers with Generative AI, Preference Optimization, and Agentic Systems
Speaker: Kangfu Mei

Thu, Jun 4 | 8:00AM — 5:20PM, Mile High 3B
Video Generative Models: Benchmarks and Evaluation
Speaker: Ming-Hsuan Yang
Organizers: Sicong Jiang, Yilin Wang, Pooja Verlani

Thu, Jun 4 | 8:25AM — 12:35PM, Mile High 4CD
Personalization in Generative AI
Speaker: Nataniel Ruiz

Thu, Jun 4 | 8:30AM — 5:30PM, Room 107
Embodied Artificial Intelligence
Speakers: Lewis Chiang, Ruiqi Gao

Thu, Jun 4 | 8:30AM — 12:50PM, Room 110
Physically Grounded Human Perception and Modeling
Speaker: Dima Damen
Organizer: Thabo Beeler

Thu, Jun 4 | 8:30AM — 1:00PM, Room 103
Safe Artificial Intelligence for All Domains
Organizer: Larissa Triess

Thu, Jun 4 | 8:30AM — 5:00PM, Four Seasons 4
Video Large Language Models
Speaker: Ruben Villegas
Organizers: Venkata Sai Nikhil Thodupunuri, Ravi Vayuvegula

Thu, Jun 4 | 8:35AM — 12:15PM, Room 712
Open-World Vision
Speaker: Boqing Gong
Organizer: Yunhan Zhao

Thu, Jun 4 | 8:45AM — 5:00PM, Room 205
Generative Models for Computer Vision
Speaker: Sherry Yang

Thu, Jun 4 | 8:45AM — 1:00PM, Room 709
VizWiz Grand Challenge: Interpreting Images and Videos Taken by Blind People
Speakers: Cordelia Schmid, Shaun Kane

Thu, Jun 4 | 9:00AM — 5:00PM, Room 708
Adversarial Machine Learning on Computer Vision: Safety of Vision-Language Agents
Speaker: Florian Tramèr

Thu, Jun 4 | 9:00AM — 5:00PM, Room 605
Embodied Reasoning in Action: Workshop and Challenge on Embodied Reasoning for Robotic Manipulation
Organizer: Wentao Yuan

Thu, Jun 4 | 9:00AM — 5:30PM, Room 109
Human-Interactive Generation and Editing
Speakers: Shuyang Sun, Zhengqi Li, Jack Parker-Holder

Thu, Jun 4 | 9:00AM — 11:30AM, Mile High 1CD
Sight and Sound
Organizers: Arsha Nagrani, William Freeman, Andrew Zisserman

Thu, Jun 4 | 9:05AM — 5:00PM, Room 501
Visual Concepts
Organizer: Shenhan Qian

Thu, Jun 4 | 9:35AM — 2:00PM, Mile High 4EF
UG2+ Workshop and Challenge: Bridging the Gap between Computational Photography and Visual Perception
Organizers: Patrick Rim, Hyoungseob Park

Thu, Jun 4 | 1:00PM — 6:00PM, Room 506
4D Vision: Modeling the Dynamic World
Speakers: Dima Damen, Noah Snavely
Organizer: Leonidas Guibas

Thu, Jun 4 | 1:00PM — 5:30PM, Room 2E/2H
BigMAC: Big Model Adaptation for Computer Vision
Speaker: Cordelia Schmid
Organizer: Aida Nematzadeh

Thu, Jun 4 | 1:00PM — 5:30PM, Room 709
CV4Science: Using Computer Vision for the Sciences
Speaker: Bill Freeman

Thu, Jun 4 | 1:00PM — 6:00PM, Room 603
Generative 3D Reconstruction
Speaker: Philipp Henzler
Organizers: Daniel Barath, Fabian Manhardt, Marie-Julie Rakotosaona, Michael Niemeyer, Federico Tombari, Michael Oechsle, Keisuke Tateno

Thu, Jun 4 | 1:00PM — 5:55PM, Room 504
Image Matching: Local Features and Beyond
Speaker: Paul-Edouard Sarlin
Organizer: Eduard Tulls

Thu, Jun 4 | 1:00PM — 6:00PM, Mile High 4CD
Journey to the Awards: Generative AI for Movie-Grade Video Production (J2A)
Speaker: Janne Kontkanen

Thu, Jun 4 | 1:00PM — 6:00PM, Room 110
Medical Reasoning with Vision Language Foundation Models
Organizer: Xiaoxiao Li

Thu, Jun 4 | 1:00PM — 4:50PM, Mile High 3A
Multi-Modal Reasoning for AI Agents
Organizer: Annie Chen

Thu, Jun 4 | 1:30PM — 5:30PM, Room 4AB
Appearance Understanding and Generation
Speaker: Dor Verbin

Thu, Jun 4 | 1:30PM — 5:15PM, Room 102/104
Simulation for Autonomous Driving
Organizer: Shimon Whiteson

Thu, Jun 4 | 2:00PM — 5:30PM, Room 210/212
See the World in a Different Light: Physical Appearance Modeling and Relighting in the Age of Generative AI
Speakers: Ira Kemelmacher-Shlizerman, Dor Verbin
Organizers: Jianchun Chen, Yingyan Xu

Organizing Committee

Boqing Gong
- Tutorial Chair
Yale Song
- Social Chair

Explore our many areas of focus

Building a collaborative ecosystem

Shaping the future together

Translating discovery into real-world impact

Google at CVPR 2026

Google @ CVPR 2026

Quick links