An Analysis of Object Representations in Deep Visual Trackers

Ross Goroshin; Jonathan Tompson; Debidatta Dwibedi

An Analysis of Object Representations in Deep Visual Trackers

Ross Goroshin

Jonathan Tompson

Debidatta Dwibedi

Google Research (2020)

Download Google Scholar

Abstract

Fully convolutional deep correlation networks are currently the state of the art approaches to single object visual tracking. It is commonly assumed that these networks perform tracking by detection by matching features of the object instance with features of the scene. Strong architectural priors and conditioning on the object representation is thought to encourage this tracking strategy. Despite these efforts, we show that deep trackers often default to “tracking by saliency” detection – without relying on the object representation. This leads us to introduce an auxiliary detection task that encourages more discriminative object representations and improves tracking performance.

Explore our many areas of focus

Building a collaborative ecosystem

Shaping the future together

Translating discovery into real-world impact

An Analysis of Object Representations in Deep Visual Trackers

Abstract

Research Areas

Meet the teams driving innovation

Google AI

Google Cloud

Google DeepMind

Google Labs