Dueling Bandits with Team Comparisons

Lee Cohen
Ulrike Schmidt-Kraepelin
Yishay Mansour
NeurIPS 2021(2021) (to appear)
Google Scholar


We introduce the \emph{dueling teams problem}, a new online-learning setting in which the learner compares disjoint pairs of $k$-sized \emph{teams} from a universe of $n$ players. The goal of the learner is to minimize the number of duels required to identify, with high probability, a \textit{Condorcet winning team}, i.e., a team which wins against any other disjoint team (with a probability of at least $1/2$). In particular, we assume a linear order on the teams which implies the existence of a Condorcet winning team. We formalize our model by building upon the dueling bandits setting \cite{Yue2012} and provide several algorithms, both for stochastic and deterministic settings. For the deterministic case, our algorithm identifies a Condorcet winning team after $\mathcal{O}(nk\log{k}+k\log k(\frac{\log \log k}{\Delta} + k))$ duels, where $\Delta$ is a gap parameter. In addition, we provide a gap-independent algorithm which requires $\mathcal{O}(nk\log{k}+2^{O(k)})$ duels. For the stochastic setting, we reduce this case to the deterministic case, under the assumption that the probabilities are bounded away from $1/2$.

Research Areas