Marginalized Bundle Adjustment: Multi-View Camera Pose from Monocular Depth Estimates

Shengjie Zhu; Ahmed Abdelkader; Mark Matthews; Xiaoming Liu; Vincent Chu

Marginalized Bundle Adjustment: Multi-View Camera Pose from Monocular Depth Estimates

Shengjie Zhu

Ahmed Abdelkader

Mark Matthews

Xiaoming Liu

Vincent Chu

International Conference on 3D Vision (2026)

Download Google Scholar

Abstract

Structure-from-Motion (SfM) is a classical 3D vision task for recovering camera parameters and scene geometry from multi-view images. Recent advances in deep learning enable accurate monocular depth estimation (MDE) that infers structure from a single image without depending on camera motion. But integrating MDE into SfM remains challenging. Unlike classical triangulated sparse pointclouds, MDE produces dense depthmaps with significantly higher error variance. Inspired by modern RANSAC estimators, we propose a Marginalized Bundle Adjustment (MBA) to accommodate MDE error variance with its density. With MBA, we show that MDE depthmaps are sufficiently accurate to support SoTA or competitive results in Structure-from-Motion and camera relocalization. Our benchmark demonstrates consistent remarkable results from two-view, few-frames small multiview, to thousands-frames large multiview system. Our method highlights the significant potential of MDE on multi-view 3D vision tasks.

Explore our many areas of focus

Building a collaborative ecosystem

Shaping the future together

Translating discovery into real-world impact

Marginalized Bundle Adjustment: Multi-View Camera Pose from Monocular Depth Estimates

Abstract

Research Areas

Meet the teams driving innovation

Google AI

Google Cloud

Google DeepMind

Google Labs