Multilabel classification is a challenging problem arising in applications ranging from information retrieval to image tagging. A popular approach to this problem is to employ a reduction to a suitable series of binary or multiclass problems (e.g., computing a softmax based cross-entropy over the relevant labels). While such methods have seen empirical success, less is understood about how well they approximate two fundamental performance measures: the precision and recall@k. In this paper, we study three commonly used reductions, and two new reductions based on a normalised loss function, wherein the contribution of each instance is normalised by the number of relevant labels. A surprising outcome of our study is that that each reduction is provably consistent with respect to either precision or recall, but not both. Further, we explicate that the probability scores obtained from reductions focussed on precision must be interpreted with caution. We empirically validate our results on real-world datasets, showing in particular that our normalised loss function yields recall gains over existing reductions.