CubeDiff: Repurposing Diffusion-Based Image Models for Panorama Generation

Nikolai Kalischek
Michael Oechsle
Fabian Manhardt
Philipp Henzler
Konrad Schindler
2025

Abstract

We introduce a novel method for generating 360° panoramas from text prompts
or images. Our approach leverages recent advances in 3D generation by employ-
ing multi-view diffusion models to jointly synthesize the six faces of a cubemap.
Unlike previous methods that rely on processing equirectangular projections or
autoregressive generation, our method treats each face as a standard perspec-
tive image, simplifying the generation process and enabling the use of existing
multi-view diffusion models. We demonstrate that these models can be adapted to
produce high-quality cubemaps without requiring correspondence-aware attention
layers. Our model allows for fine-grained text control, generates high resolution
panorama images and generalizes well beyond its training set, whilst achieving
state-of-the-art results, both qualitatively and quantitatively.
×