Novel View Synthesis from a Single Image via Point Cloud Transformation
Abstract
In this paper, the argument is made that for true novel view synthesis of objects, where the object can be synthesized from any viewpoint, an explicit 3D shape representation is desired. For this point clouds are estimated, which can be freely rotated into the desired view and then projected into a new image. This novel view, however, is sparse by nature and hence this coarse view is used as the input of an image completion network to obtain the dense image. In order to acquire the point cloud, without resorting to special acquisition hardware or multi-view approaches, the pixel-wise depth map is estimated from a single RGB input image. Combined with the camera intrinsics this results in a partial point cloud. By using forward warping and backward warping between the input view and the target view, the network can be trained end-to-end without supervision on depth. Experimentally the benefit of using point clouds as an explicit 3D shape for novel view synthesis is validated on the 3D ShapeNet benchmark.