CutPaste: Self-Supervised Learning for Anomaly Detection and Localization
Abstract
In this work, we aim at constructing a high performance model for defect detection that detects unknown anomalous patterns of an image without anomalous data.
To this end, we propose a simple two-stage framework for building anomaly detectors using normal training data only, where we first learn self-supervised deep representations and then build a generative one-class classifier on learned representations. We learn representations by classifying normal data from the CutPaste, a simple data augmentation strategy that cuts an image patch and pastes at random location of a large image.
Our empirical study on MVTec anomaly detection database demonstrates the proposed algorithm is general to detecting various types of real-world defects. We bring the
improvement upon previous arts by 3 AUCs when learning representations from scratch. By transfer learning representations from an ImageNet pretrained model, we achieve a new state-of-the-art 96.6 AUC.
Lastly, we extend the framework to learn and extract representations from patches to allow localization of defective areas without the need of annotation.
To this end, we propose a simple two-stage framework for building anomaly detectors using normal training data only, where we first learn self-supervised deep representations and then build a generative one-class classifier on learned representations. We learn representations by classifying normal data from the CutPaste, a simple data augmentation strategy that cuts an image patch and pastes at random location of a large image.
Our empirical study on MVTec anomaly detection database demonstrates the proposed algorithm is general to detecting various types of real-world defects. We bring the
improvement upon previous arts by 3 AUCs when learning representations from scratch. By transfer learning representations from an ImageNet pretrained model, we achieve a new state-of-the-art 96.6 AUC.
Lastly, we extend the framework to learn and extract representations from patches to allow localization of defective areas without the need of annotation.