In this paper, we revisit the penalized MLE for learning the exponential family distribution whose natural parameter belongs to a reproducing kernel Hilbert space. We introduce the doubly dual embedding technique, by which the computation for the partition function is avoided. It also paves the path to learn a flexible sampler simultaneously, therefore, amortizing the cost of Monte-Carlo sampling in the inference stage. The estimator can be easily generalized for kernel conditional exponential family. Meanwhile, as a byproduct, we establish the connection between Wasserstein GAN and infinite-dimensional exponential family estimation, revealing a new perspective for understanding GANs.
Comparing to the existing score matching based estimator initiated by Sriperumbudur et al. (2017), our method is not only more efficient in terms of both the memory and computational cost, but also achieves better statistical convergence rate. The proposed estimator outperforms the current state-of-the-art methods empirically on both kernel conditional and unconditional exponential family estimation.