On the Convergence of Regret Minimization Dynamics in Concave Games
Abstract
We consider standard regret minimization setting where at each time
step the decision maker has to choose a distribution over
k alternatives, and then observes the loss of each alternative. The
setting is very similar to the classical online job scheduling setting
with three major differences:
Information model:
in the regret minimization setting losses are only observed after the
actions (assigning the job to a machine) is performed and not observed
before the action selection, as assumed in the classical online job
scheduling setting,
The comparison class:
in regret minimization the comparison class is the best static
algorithm (i.e., distribution over alternatives) and not the optimal
offline solution.
Performance measure: In regret minimization we measure the
additive difference to the optimal solution in the comparison class, in
contrast to the ratio used in online job scheduling
setting.
Motivated by load balancing and job scheduling,
we consider a global cost function (over the losses incur by each
alternative/machine),
rather than simply a summation of the instantaneous losses as done
traditionally in regret minimization. Such global cost functions
include the makespan (the maximum over the alternatives/machines) and
the Ld norm (over the alternatives/machines).
The major contribution of this work is to design a novel regret minimization
algorithm based on calibration that guarantees a vanishing average
regret,
where the regret is measured with respect to the best static decision
maker, who selects the same distribution over alternatives at every
time step.
Our results hold for a wide class of global cost functions. which
include the makespan and the Ld norms, for d>1.
In contrast, we show that for concave global cost functions, such as
Ld norms for d<1,
the worst-case average regret does not vanish.
In addition to the general calibration based algorithm, we provide
simple and efficient algorithms for special interesting cases.
step the decision maker has to choose a distribution over
k alternatives, and then observes the loss of each alternative. The
setting is very similar to the classical online job scheduling setting
with three major differences:
Information model:
in the regret minimization setting losses are only observed after the
actions (assigning the job to a machine) is performed and not observed
before the action selection, as assumed in the classical online job
scheduling setting,
The comparison class:
in regret minimization the comparison class is the best static
algorithm (i.e., distribution over alternatives) and not the optimal
offline solution.
Performance measure: In regret minimization we measure the
additive difference to the optimal solution in the comparison class, in
contrast to the ratio used in online job scheduling
setting.
Motivated by load balancing and job scheduling,
we consider a global cost function (over the losses incur by each
alternative/machine),
rather than simply a summation of the instantaneous losses as done
traditionally in regret minimization. Such global cost functions
include the makespan (the maximum over the alternatives/machines) and
the Ld norm (over the alternatives/machines).
The major contribution of this work is to design a novel regret minimization
algorithm based on calibration that guarantees a vanishing average
regret,
where the regret is measured with respect to the best static decision
maker, who selects the same distribution over alternatives at every
time step.
Our results hold for a wide class of global cost functions. which
include the makespan and the Ld norms, for d>1.
In contrast, we show that for concave global cost functions, such as
Ld norms for d<1,
the worst-case average regret does not vanish.
In addition to the general calibration based algorithm, we provide
simple and efficient algorithms for special interesting cases.