Robust Budget Pacing with a Single Sample
Abstract
Motivated by the online advertising industry, we study the non-stationary stochastic budget management problem: An advertiser repeatedly participates in $T$ second-price auctions, where her value and the highest competing bid are drawn from unknown time-varying distributions, with the goal of maximizing her total utility subject to her budget constraint. In the absence of any information about the distributions, it is known that sub-linear regret cannot be achieved. We assume access to historical samples, with the goal of developing algorithms that are robust to discrepancies between the sampling distributions and the true distributions. We show that our Dual Follow-The-Regularized-Leader algorithm is robust and achieves a near-optimal $\tilde O(\sqrt{T})$-regret with just one sample per distribution, drastically improving over the best-known sample-complexity of $T$ samples per distribution.