Supervised learning typically relies on manual annotation of the true labels. However, when there are many potential labels, it will be time consuming for a human annotator to search these for the best one. On the other hand, comparing two candidate labels is often much easier. In this paper, we focus on this type of pairwise supervision, and ask how it can be used effectively in learning, and in particular active learning. We obtain several surprising results in this context. In principle, finding the best label out of $k$ can be done with $k-1$ active queries. However, we show that there is a natural class where this approach is in fact sub-optimal, and that there is a more comparison-efficient active learning scheme. A key element in our analysis is the ``label neighborhood graph'' of the true distribution, which has an edge between two classes if they share a decision boundary. We also show that in the PAC setting, pairwise comparisons cannot provide improved sample complexity in the worst case. We complement our theoretical results with experiments, clearly demonstrating the effect of the neighborhood graph on sample complexity.