Do better ImageNet classifiers assess perceptual similarity better?

Manoj Kumar

Neil Houlsby

Nal Kalchbrenner

Ekin Dogus Cubuk

TMLR 2022, TMLR 2022

Download Google Scholar

Abstract

Human-like perceptual similarity is an emergent property in the intermediate feature space of ImageNet-pretrained classifiers. Perceptual distances between images, as measured in the space of pre-trained image embeddings, have outperformed prior low-level metrics significantly on assessing image similarity. This has led to the wide adoption of perceptual distances as both an evaluation metric and an auxiliary training objective for image synthesis tasks. While image classification has improved by leaps and bounds, the de facto standard for computing perceptual distances uses older, less accurate models such as VGG and AlexNet. Motivated by this, we evaluate the perceptual scores of modern networks: ResNets, EfficientNets and VisionTransformers. Surprisingly, we observe an inverse correlation between ImageNet accuracy and perceptual scores: better classifiers achieve worse perceptual scores. We dive deeper into this, studying the ImageNet accuracy/perceptual score relationship under different hyperparameter configurations. Improving accuracy improves perceptual scores up to a certain point, but beyond this point we uncover a Pareto frontier between accuracies and perceptual scores. We explore this relationship further using distortion invariance, spatial frequency sensitivity and alternative perceptual functions. Based on our study, we find a ImageNet trained ResNet-6 network whose emergent perceptual score matches the best prior score obtained with networks trained explicitly on a perceptual similarity task.

Defining the technology of today and tomorrow.

Philosophy

People

Research areas

Foundational ML & Algorithms

Computing Systems & Quantum AI

Science, AI & Society

Projects

Publications

Resources

Shaping the future, together.

Student programs

Faculty programs

Conferences & events

Do better ImageNet classifiers assess perceptual similarity better?

Abstract

Research Areas

Learn more about how we conduct our research