Dueling Bandits with Team Comparisons
Abstract
We introduce the \emph{dueling teams problem}, a new online-learning setting in which the learner compares disjoint pairs of $k$-sized \emph{teams} from a universe of $n$ players.
The goal of the learner is to minimize the number of duels required to identify, with high probability, a \textit{Condorcet winning team}, i.e., a team which
wins against any other disjoint team (with a probability of at least $1/2$).
In particular, we assume a linear order on the teams which implies the existence of a Condorcet winning team.
We formalize our model by building upon the dueling bandits setting \cite{Yue2012} and provide several algorithms, both for stochastic and deterministic settings.
For the deterministic case, our algorithm identifies a Condorcet winning team after $\mathcal{O}(nk\log{k}+k\log k(\frac{\log \log k}{\Delta} + k))$ duels, where $\Delta$ is a gap parameter. In addition, we provide a gap-independent algorithm which requires $\mathcal{O}(nk\log{k}+2^{O(k)})$ duels.
For the stochastic setting, we reduce this case to the deterministic case, under the assumption that the probabilities are bounded away from $1/2$.