DiffHuman: Probabilistic Photorealistic 3D Reconstruction of Humans
Abstract
                We present DiffHuman, a probabilistic method for photorealistic 3D human reconstruction from a single RGB image. Despite the ill-posed nature of this problem, most methods are deterministic and output a single solution, often resulting in a lack of geometric detail and blurriness in unseen or uncertain regions. In contrast, DiffHuman predicts a distribution over 3D reconstructions conditioned on an image, which allows us to sample multiple detailed 3D avatars that are consistent with the input image. DiffHuman is implemented as a conditional diffusion model that denoises partial observations of an underlying pixel-aligned 3D representation. In testing, we can sample a 3D shape by iteratively denoising renderings of the predicted intermediate representation. Further, we introduce an additional generator neural network that approximates rendering with considerably reduced runtime (55x speed up), resulting in a novel dual-branch diffusion framework. We evaluate the effectiveness of our approach through various experiments. Our method can produce diverse, more detailed reconstructions for the parts of the person not observed in the image, and has competitive performance for the surface reconstruction of visible parts.
            
        