Predictor-Corrector Sampling for Discrete Diffusion Models
Abstract
This paper studies non-autoregressive transformers for the image synthesis task from the lens of discrete diffusion models. We find that generative methods based on non-autoregressive transformers suffer from decoding compounding error due to the parallel sampling of visual tokens. To alleviate it, we introduce discrete predictor-corrector diffusion models (DPC). Predictor-corrector samplers are a recently introduced class of samplers for diffusion models which improve upon ancestral samplers by correcting the sampling distribution of intermediate diffusion states using MCMC methods. In DPC, the Langevin corrector, which does not have a direct counterpart in discrete space, is replaced with a discrete MCMC transition defined by a learned corrector kernel. The corrector kernel is trained to make the correction steps achieve asymptotic convergence, in distribution, to the real marginal of the intermediate diffusion states. Our experiments show that equipped with DPC, discrete diffusion models can achieve comparable quality to continuous diffusion models, while having orders of magnitude faster sampling times. DPC improves upon existing discrete latent space models for class-conditional image generation on ImageNet, and outperforms recent diffusion models and GANs, according to visual evaluation user studies.