LOLNeRF: Learn from One Look
Abstract
We present a method for learning a generative 3D model based on neural radiance fields, trained solely from single-views of objects. While generating realistic images is no longer a difficult task, producing the corresponding 3D structure such that they can be rendered from different views is non-trivial. Here, we show that, unlike existing methods, one does not need any multi-view data to achieve this goal. Specifically, we show that by learning to reconstruct many images aligned to an approximate canonical pose, with a single network conditioned on a shared latent space, you can learn a space of radiance fields that models the shape and appearance of a class of objects. We demonstrate this by training models to reconstruct a number of object categories including humans, cats, and cars, all using datasets that contain only single views of each subject and no depth or geometry information. Our experiments show that this method achieves state-of-the-art results in novel view synthesis and monocular depth prediction.