Scalable, high-quality object detection

Christian Szegedy
Scott Reed
Dragomir Anguelov


Most high quality object detection approaches use the same scheme: salience-based object proposal methods followed by post-classification using deep convolutional features. In this work, we demonstrate that fully learnt, data driven proposal generation methods can effectively match the accuracy of their hand engineered counterparts, while allowing for very efficient runtime-quality trade-offs. This is achieved by making several key improvements to the MultiBox method [4], among which are an improved neural network architecture, use of contextual features and a new loss function that is robust to missing groundtruth labels. We show that our proposal generation method can closely match the performance of Selective Search [22] at a fraction of the cost. We report new single model state-ofthe-art on the ILSVRC 2014 detection challenge data set, with 0.431 mean average precision when combining both Selective Search and MultiBox proposals with our postclassification model. Finally, our approach allows the training of single class detectors that can process 50 images per second on a Xeon workstation, using CPU only, rivaling the quality of the current best performing methods.

Research Areas