Diversity-Sensitive Conditional Generative Adversarial Networks
Abstract
We propose a simple yet highly effective method that addresses the mode-collapse
problem in the Conditional Generative Adversarial Network (cGAN). Although
conditional distributions are multi-modal (i.e., having many modes) in practice,
most cGAN approaches tend to learn an overly simplified distribution where an
input is always mapped to a single output regardless of variations in latent code.
To address such issue, we propose to explicitly regularize the generator to produce
diverse outputs depending on latent codes. The proposed regularization is simple, general, and can be easily integrated into most conditional GAN objectives.
Additionally, explicit regularization on generator allows our method to control a
balance between visual quality and diversity. We demonstrate the effectiveness
of our method on three conditional generation tasks: image-to-image translation,
image inpainting, and future video prediction. We show that simple addition of
our regularization to existing models leads to surprisingly diverse generations,
substantially outperforming the previous approaches for multi-modal conditional
generation specifically designed in each individual task.
problem in the Conditional Generative Adversarial Network (cGAN). Although
conditional distributions are multi-modal (i.e., having many modes) in practice,
most cGAN approaches tend to learn an overly simplified distribution where an
input is always mapped to a single output regardless of variations in latent code.
To address such issue, we propose to explicitly regularize the generator to produce
diverse outputs depending on latent codes. The proposed regularization is simple, general, and can be easily integrated into most conditional GAN objectives.
Additionally, explicit regularization on generator allows our method to control a
balance between visual quality and diversity. We demonstrate the effectiveness
of our method on three conditional generation tasks: image-to-image translation,
image inpainting, and future video prediction. We show that simple addition of
our regularization to existing models leads to surprisingly diverse generations,
substantially outperforming the previous approaches for multi-modal conditional
generation specifically designed in each individual task.