To crop or not to crop: comparing whole-image and cropped classification on a large dataset of camera trap images

Jorge Ahumada
Sara Beery
Stefan Istrate
Clint Kim
Tanya Birch
Tomer Gadot
IET Computer Vision (2024)

Abstract

Camera traps are frequently used for non-invasive monitoring of wildlife, but their widespread adoption has created a data processing bottleneck: a single camera trap survey can create millions of images, and the labor required to review those images strains the resources of conservation organizations. AI is a promising approach for accelerating image review (i.e., semi-automatically identifying the species that are present in each image), but AI tools for camera trap data are still imperfect; in particular, classifying small animals remains difficult, and accuracy falls off outside of the ecosystems in which a model was trained. It has been proposed that incorporating an object detector into a camera trap image analysis pipeline may help address these challenges, but the benefit of object detection for camera trap image analysis has not been systematically evaluated in the literature. In this work, we assess the hypothesis that classifying animals cropped from camera trap images using a species-agnostic detector will yield better accuracy than classifying whole images. We find that incorporating an object detection stage into an image classification pipeline yields a macro-average F1 improvement of around 25% on a very large, long-tailed dataset, and that this improvement is reproducible on a large public dataset and a smaller public benchmark dataset. We describe a classification architecture that performs well for both whole images and detector-cropped animals, and demonstrate that this architecture performs at a state-of-the-art level on a public benchmark dataset.