Linear Time Sinkhorn Divergences using Positive Features

Meyer Scetbon
Marco Cuturi
Neurips 2020
Google Scholar

Abstract

Although Sinkhorn divergences are now routinely used in data sciences to compare probability distributions, the computational effort required to compute them remains expensive, growing quadratically in the size $n$ of the support of these distributions. Indeed, the mechanics of solving optimal transport (OT) with an entropic regularization require computing a $n\times n$ kernel matrix (the neg-exponential of a $n\times n$ pairwise ground cost matrix); that kernel matrix is then repeatedly applied to a vector. We propose in this work to rely on ground costs of the form $c(x,y)=-\log\dotp{\varphi(x)}{\varphi(y)}$ where $\varphi$ is a map from the ground space onto the positive orthant $\RR^r_+$, with $r\ll n$, which yields, equivalently, a kernel $k(x,y)=\dotp{\varphi(x)}{\varphi(y)}$. This choice ensures that Sinkhorn divergences can be computed with an effort scaling as $O(nr)$. We provide a general framework for such cost structures, and show that several usual cost functions can be approximated using this form with guarantees. Another advantage of our approach is that our approximation remains fully differentiable with respect to the distributions, as opposed to previously proposed adaptive low-rank approximations of the kernel matrix. We use that specific property to train a faster variant of OT-GAN~\cite{salimans2018improving}.