Structured Denoising Diffusion Models in Discrete State-Spaces

Jonathan Ho
Rianne van den Berg
Advances in Neural Information Processing Systems (2021) (to appear)

Abstract

Denoising diffusion probabilistic models (DDPMs) [Ho et al. 2021] have shown impressive results on image and waveform generation in continuous state spaces. Here, we introduce Discrete Denoising Diffusion Probabilistic Models (D3PMs), diffusion-like generative models of discrete data that generalize the multinomial diffusion model of Hoogeboom et al. [2021] by going beyond transition matrices with uniform transition probabilities. This includes discrete transition matrices that mimic Gaussian kernels in continuous space, kernels based on nearest neighbors in embedding space, and kernels that introduce masking. The third allows us to draw a connection between diffusion models and autoregressive and mask-based generative models. We show that the choice of transition kernel is an important design decision that leads to improved results in image and text domains. In addition, we show that expressing transition matrices as matrix exponentials leads to efficient implementations and controllable schedules. We also introduce a new loss function that combines the variational lower bound with an auxiliary cross entropy loss. For text, this model class achieves strong results on character-level text generation and scales to LM1B with a large subword-tokenized vocabulary. On the image dataset CIFAR-10, our models achieve FID and Inception scores rivaling those of the continuous DDPM model.

Research Areas