- Aäron van den Oord
- Nal Kalchbrenner
Modelling the distribution of natural images is a landmark problem in unsupervised learning. We train a deep recurrent neural network to sequentially predict the pixels in an image. The network models the discrete joint probability of the raw pixel values. The distribution, though formally simple, can be arbitrarily complex and multimodal. The distribution is tractable and its ability to generalize is readily measured. Within a pixel the colors are also predicted sequentially and depend on each other and the previous context. We design two types of parallel spatial LSTM layers to make the network fast and scalable. Our main result is a compression score of 3.00 bits per color on CIFAR-10, which is considerably better than previous art. We also set new benchmarks on 32 x 32 and 64 x 64 ImageNet. Samples generated from the ImageNet model turn out general, sharp and globally coherent.