An Online Algorithm for Large Scale Image Similarity Learning

Gal Chechik
Varun Sharma
Uri Shalit
Samy Bengio
Advances in Neural Information Processing Systems (2009)
Google Scholar

Abstract

Learning a measure of similarity between pairs of objects is a
fundamental problem in machine learning. It stands in the core of
classifications methods like kernel machines, and is particularly
useful for applications like searching for images that are similar
to a given image or finding videos that are relevant to a given
video. In these tasks, users look for objects that are not only
visually similar but also semantically related to a given
object. Unfortunately, current approaches for learning similarity do
not scale to large datasets, especially when imposing metric
constraints on the learned similarity.
We describe OASIS, a method for learning pairwise similarity that is
fast and scales linearly with the number of objects and the number of
non-zero features. Scalability is achieved through online learning of a
bilinear model over sparse representations using a large margin
criterion and an efficient hinge loss cost. OASIS is accurate at a
wide range of scales: on a standard benchmark with thousands of
images, it is more precise than state-of-the-art methods, and faster
by orders of magnitude. On 2 millions images collected from the web,
OASIS can be trained within 3 days on a single CPU. The non-metric
similarities learned by OASIS can be transformed into metric
similarities, achieving higher precisions than similarities that are
learned as metrics in the first place. This suggests an approach for
learning a metric from data that is larger by two orders of magnitude
than was handled before.

Research Areas