Ad Click Prediction: a View from the Trenches
Abstract
Predicting ad click--through rates (CTR) is a massive-scale learning
problem that is central to the multi-billion dollar online
advertising industry. We present a selection of case studies and
topics drawn from recent experiments in the setting of a deployed
CTR prediction system. These include improvements in the context of
traditional supervised learning based on an FTRL-Proximal online
learning algorithm (which has excellent sparsity and convergence
properties) and the use of per-coordinate learning rates.
We also explore some of the challenges that arise in a real-world
system that may appear at first to be outside the domain of
traditional machine learning research. These include useful tricks
for memory savings, methods for assessing and visualizing
performance, practical methods for providing confidence estimates
for predicted probabilities, calibration methods, and methods for
automated management of features. Finally, we also detail several
directions that did not turn out to be beneficial for us, despite
promising results elsewhere in the literature. The goal of this
paper is to highlight the close relationship between theoretical
advances and practical engineering in this industrial setting, and
to show the depth of challenges that appear when applying
traditional machine learning methods in a complex dynamic system.
problem that is central to the multi-billion dollar online
advertising industry. We present a selection of case studies and
topics drawn from recent experiments in the setting of a deployed
CTR prediction system. These include improvements in the context of
traditional supervised learning based on an FTRL-Proximal online
learning algorithm (which has excellent sparsity and convergence
properties) and the use of per-coordinate learning rates.
We also explore some of the challenges that arise in a real-world
system that may appear at first to be outside the domain of
traditional machine learning research. These include useful tricks
for memory savings, methods for assessing and visualizing
performance, practical methods for providing confidence estimates
for predicted probabilities, calibration methods, and methods for
automated management of features. Finally, we also detail several
directions that did not turn out to be beneficial for us, despite
promising results elsewhere in the literature. The goal of this
paper is to highlight the close relationship between theoretical
advances and practical engineering in this industrial setting, and
to show the depth of challenges that appear when applying
traditional machine learning methods in a complex dynamic system.