Improved Online Learning Algorithms for CTR Prediction in Ad Auctions

Zhe Feng
Christopher Liaw
Zixin Zhou
Google Scholar


In this work, we investigate the online learning problem of revenue maximization in ad auctions, where the seller needs to learn the click-through rates (CTRs) of each ad candidate and charge the price of the winner through a pay-per-click manner. We focus on two models of the advertisers’ strategic behaviours. First, we assume that the advertiser is completely myopic; i.e. in each round, they aim to maximize their utility only for the current round. In this setting, we develop an online mechanism based on upper-confidence bounds that achieves a tight O(√T) regret in the worst-case and negative regret when the values are static across all the auctions. Next, we assume that the advertiser is non-myopic but cares about their γ-discounted long term utility. This setting is much more complex since an advertiser is incentived to “game” the mechanism by bidding strategically in earlier rounds. Advertisers may place strategic bids in earlier rounds to influence the algorithm in later rounds. In this setting, we develop an online mechanism that allows us to bound how often the advertiser misreports. Further, this mechanism achieves O(√T) regret against the optimal mechanism with truthful bids, when γ < 1. This complements the prior work that shows that Ω(T^{2/3}) regret is necessary when γ = 1, i.e. when the advertisers do not discount their utility. Furthermore, we provide an algorithm to achieve constant regret for the static valuation setting, even when γ = 1.