Learning with Global Cost in Stochastic Environments
Abstract
We consider an online learning setting where at each time step the decision maker has to choose how
to distribute the future loss between k alternatives, and then observes the loss of each alternative,
where the losses are assumed to come from a joint distribution. Motivated by load balancing and job
scheduling, we consider a global cost function (over the losses incurred by each alternative), rather
than a summation of the instantaneous losses as done traditionally in online learning. Specifically,
we consider the global cost functions: (1) the makespan (the maximum over the alternatives) and
(2) the L_d norm (over the alternatives) for d > 1. We design algorithms that guarantee logarithmic
regret for this setting, where the regret is measured with respect to the best static decision (one
selects the same distribution over alternatives at every time step). We also show that the least
loaded machine, a natural algorithm for minimizing the makespan, has a regret of the order of \sqrt{T} .
We complement our theoretical findings with supporting experimental results.
to distribute the future loss between k alternatives, and then observes the loss of each alternative,
where the losses are assumed to come from a joint distribution. Motivated by load balancing and job
scheduling, we consider a global cost function (over the losses incurred by each alternative), rather
than a summation of the instantaneous losses as done traditionally in online learning. Specifically,
we consider the global cost functions: (1) the makespan (the maximum over the alternatives) and
(2) the L_d norm (over the alternatives) for d > 1. We design algorithms that guarantee logarithmic
regret for this setting, where the regret is measured with respect to the best static decision (one
selects the same distribution over alternatives at every time step). We also show that the least
loaded machine, a natural algorithm for minimizing the makespan, has a regret of the order of \sqrt{T} .
We complement our theoretical findings with supporting experimental results.