RePAST: Relative Pose Attention Scene Representation Transformer

Aleksandr Safin; Daniel Duckworth; Mehdi S. M. Sajjadi

RePAST: Relative Pose Attention Scene Representation Transformer

Aleksandr Safin

Daniel Duckworth

Mehdi S. M. Sajjadi

CVPR Workshops 3DMV & T4V Spotlight (2023)

Download Google Scholar

Abstract

The Scene Representation Transformer (SRT) is a recent method to render novel views at interactive rates. Since SRT uses camera poses with respect to an arbitrarily chosen reference camera, it is not invariant to the order of the input views. As a result, SRT is not directly applicable to large-scale scenes where the reference frame would need to be changed regularly. In this work, we propose Relative Pose Attention SRT (RePAST): Instead of fixing a reference frame at the input, we inject pairwise relative camera pose information directly into the attention mechanism of the Transformers. This leads to a model that is by definition invariant to the choice of any global reference frame, while still retaining the full capabilities of the original method. Empirical results show that adding this invariance to the model does not lead to a loss in quality. We believe that this is a step towards applying fully latent transformer-based rendering methods to large-scale scenes.

Explore our many areas of focus

Building a collaborative ecosystem

Shaping the future together

Translating discovery into real-world impact

RePAST: Relative Pose Attention Scene Representation Transformer

Abstract

Research Areas

Meet the teams driving innovation

Google AI

Google Cloud

Google DeepMind

Google Labs