Regret Minimization with Concept Drift
Abstract
In standard online learning, the goal of the learner is to maintain an average loss close to
the loss of the best-performing function in a fixed class. Classic results show that simple
algorithms can achieve an average loss arbitrarily close to that of the best function in
retrospect, even when input and output pairs are chosen by an adversary. However, in
many real-world applications such as spam prediction and classification of news articles,
the best target function may be drifting over time. We introduce a novel model of concept
drift in which an adversary is given control of both the distribution over input at each time
step and the corresponding labels. The goal of the learner is to maintain an average loss
close to the 0/1 loss of the best slowly changing sequence of functions with no more than
K large shifts. We provide regret bounds for learning in this model using an (inefficient)
reduction to the standard no-regret setting. We then go on to provide and analyze an
efficient algorithm for learning d-dimensional hyperplanes with drift. We conclude with
some simulations illustrating the circumstances under which this algorithm outperforms
other commonly studied algorithms when the target hyperplane is drifting.
the loss of the best-performing function in a fixed class. Classic results show that simple
algorithms can achieve an average loss arbitrarily close to that of the best function in
retrospect, even when input and output pairs are chosen by an adversary. However, in
many real-world applications such as spam prediction and classification of news articles,
the best target function may be drifting over time. We introduce a novel model of concept
drift in which an adversary is given control of both the distribution over input at each time
step and the corresponding labels. The goal of the learner is to maintain an average loss
close to the 0/1 loss of the best slowly changing sequence of functions with no more than
K large shifts. We provide regret bounds for learning in this model using an (inefficient)
reduction to the standard no-regret setting. We then go on to provide and analyze an
efficient algorithm for learning d-dimensional hyperplanes with drift. We conclude with
some simulations illustrating the circumstances under which this algorithm outperforms
other commonly studied algorithms when the target hyperplane is drifting.