Coupling without Communication and Drafter-Invariant Speculative Decoding

Christopher Musco
Majid Daliri
2025

Abstract

Suppose Alice has a distribution ${P}$ and Bob has a distribution ${Q}$. Alice wants to draw a sample $a\sim {P}$ and Bob a sample $b \sim {Q}$ such that $a = b$ with as high of probability as possible. It is well-known that, by sampling from an optimal coupling between the distributions, Alice and Bob can achieve $\Pr[a = b] = 1 - D_{\text{tv}}({P},{Q})$, where $D_{\text{tv}}({P},{Q})$ is the total variation distance between ${P}$ and ${Q}$.

What if Alice and Bob must solve this same problem \emph{without communicating at all?} Perhaps surprisingly, with access to public randomness, they can still achieve $\Pr[a = b] \geq \frac{1 - D_{\text{tv}}({P},{Q})}{1 + D_{\text{tv}}({P},{Q})} \geq 1-2D_{\text{tv}}({P},{Q})$ using a simple protocol based on the Weighted MinHash algorithm. This bound was shown to be optimal in the worst-case by Bavarian, Ghazi, Haramaty, Kamath, Rivest, and Sudan [ToC 2020].


In this work, we revisit the ``communication-free coupling'' problem. We provide a simpler proof of the optimality result from [Bavarian et al., 2020]. Moreover we show that, while the \emph{worst-case} success probability of Weighted MinHash cannot be improved, an equally simple protocol based on Gumbel sampling offers a Pareto improvement: for every pair of distributions ${P}$ and ${Q}$, Gumbel sampling achieves an equal or higher value of $\Pr[a = b]$ than Weighted MinHash.

Importantly, this improvement translates to practice. We demonstrate an application of communication-free coupling to \emph{speculative decoding}, a recent method for accelerating autoregressive large language models [Leviathan, Kalman, Matias, ICML 2023].
We show that communication-free protocols can be used to contruct \emph{\CSD{}} schemes, which have the desirable property that their output is fixed given a fixed random seed, regardless of what drafter is used for speculation. In experiments on a language generation task, Gumbel sampling outperforms Weighted MinHash.
Code is available at \url{https://github.com/majid-daliri/DISD}.

Finally, we study the coupling problem in the setting where communication is \emph{bounded}, rather than completely eliminated. We describe a protocol that uses just $O(\log(n/\epsilon))$ bits of communication to achieve $\Pr[a = b] = 1 - D_{\text{tv}}({P},{Q}) - \epsilon$, i.e. to essentially match optimal coupling.