Google Research

CReST: A Class-Rebalancing Self-Training Framework for Imbalanced Semi-supervised Learning

CVPR (2021) (to appear)

Abstract

Semi-supervised Learning (SSL) on unbalanced data has been under-studied. In this work, we observe standard SSL algorithms are biased towards majority classes and produces low recall on minority classes. However, they can generate highly accurate pseudo-labels on minority classes that are not fully utilized yet. This motivates us to propose a simple yet effective algorithm, named Class-Rebalancing Self-Training (CReST), to improve existing SSL algorithms on unbalanced data. The proposed CReST algorithm iteratively retrains a baseline SSL model with a dynamic labeled set expanded by adding pseudo-labeled samples from unlabeled set, where pseudo-labeled samples from minority classes are added more frequently based on the estimated class distribution. The SSL model is also equipped with an adaptive distribution alignment strategy. We show that CReST improves the state-of-the-art FixMatch on various unbalanced datasets by as much as 11.8%, and outperforms other popular rebalancing algorithms consistently. CReST is an easy-to-use component that can be plugged into any SSL algorithms to improve their capability of handling data imbalance issues.

Learn more about how we do research

We maintain a portfolio of research projects, providing individuals and teams the freedom to emphasize specific types of work