We present an approach to effectively utilize small sets of reliable labels in conjunction with massive datasets of noisy labels to learn powerful image representations. A common approach is to pre-train a network using the large set of noisy labels and fine-tune it using the clean labels. We present an alternative: we use the clean labels to captures the structure in the label space and learn a mapping between noisy and clean labels. This allows to ”clean the dataset”, and fine-tune the network using both the clean labels and the full dataset with reduced noise. The approach comprises a multi-task network that jointly learns to clean noisy labels and to annotate images with accurate labels. We evaluate our approach using the recently released Open Images dataset, containing ∼ 9 million images with multiple annotations per image. Our results demonstrate that the proposed approach outperforms fine-tuning across all major groups of labels in the Open Image dataset. The approach is particularly effective on the large number of labels with 20-80% label noise.