Infinite Class Mixup
Abstract
Mixup is a widely adopted strategy for training deep networks, where additional samples are augmented through a linear interpolation of input pairs and their corresponding labels. Mixup has shown to improve classification performance, network calibration, and out-of-distribution generalization. While effective, a cornerstone of Mixup, namely that networks learn linear behavior patterns between classes, is only indirectly enforced since the output interpolation is performed at the probability level. This paper seeks to address this limitation by instead mixing the classifiers of the labels directly for each mixed input pair. We propose to define the target of each augmented sample as a uniquely new classifier, whose parameters are given as a linear interpolation of the classifier vectors of the input sample pair. The space of all possible classifiers is continuous and spans all interpolations between classifier pairs. To perform tractable optimization, we propose a dual-contrastive Infinite Class Mixup loss, where we contrast the unique classifier of a single pair to both the mixed classifiers and the predicted outputs of all other pairs in a batch. Infinite Class Mixup is generic in nature and applies to any variant of Mixup. Empirically, we show that our formulation outperforms standard Mixup and variants such as RegMixup and Remix on balanced and long-tailed recognition benchmarks, both at large-scale and in data-constrained settings, highlighting the broad applicability of our approach.