Google Research

An Analysis of Object Representations in Deep Visual Trackers


Fully convolutional deep correlation networks are currently the state of the art approaches to single object visual tracking. It is commonly assumed that these networks perform tracking by detection by matching features of the object instance with features of the scene. Strong architectural priors and conditioning on the object representation is thought to encourage this tracking strategy. Despite these efforts, we show that deep trackers often default to “tracking by saliency” detection – without relying on the object representation. This leads us to introduce an auxiliary detection task that encourages more discriminative object representations and improves tracking performance.

Learn more about how we do research

We maintain a portfolio of research projects, providing individuals and teams the freedom to emphasize specific types of work