The label shift problem refers to the supervised learning setting wherein the train and test label distributions do not match. Existing work on this problem largely assumes access to an unlabelled test sample, which may be used to estimate the test label distribution. While such techniques have proven effective, it is not always feasible to access the target domain; further, this requires retraining if the model is to be deployed in multiple test environments. Can one instead learn a single classifier that is robust to arbitrary shifts from a certain family? In this paper, we propose such a technique based on distributionally robust optimization (DRO) using f-divergences. We design a gradient descent-proximal mirror ascent algorithm tailored for large-scale finite-sum problems to efficiently optimize this objective, and establish its convergence. We show through experiments on CIFAR-100 and ImageNet that our technique can significantly improve performance over a number of baselines in settings where the test label distribution is varied.