Video2Text: Learning to Annotate Video Content

Hrishikesh Aradhye; George Toderici; Jay Yagnik

Video2Text: Learning to Annotate Video Content

Hrishikesh Aradhye

George Toderici

Jay Yagnik

ICDM Workshop on Internet Multimedia Mining (2009)

Google Scholar

Abstract

This paper discusses a new method for automatic
discovery and organization of descriptive concepts (labels)
within large real-world corpora of user-uploaded multimedia,
such as YouTube.com. Conversely, it also provides validation
of existing labels, if any. While training, our method does not
assume any explicit manual annotation other than the weak
labels already available in the form of video title, descrip-
tion, and tags. Prior work related to such auto-annotation
assumed that a vocabulary of labels of interest (e.g., indoor,
outdoor, city, landscape) is speciﬁed a priori. In contrast,
the proposed method begins with an empty vocabulary. It
analyzes audiovisual features of 25 million YouTube.com videos
– nearly 150 years of video data – effectively searching for
consistent correlation between these features and text metadata.
It autonomously extends the label vocabulary as and when it
discovers concepts it can reliably identify, eventually leading
to a vocabulary with thousands of labels and growing. We
believe that this work signiﬁcantly extends the state of the art
in multimedia data mining, discovery, and organization based
on the technical merit of the proposed ideas as well as the
enormous scale of the mining exercise in a very challenging,
unconstrained, noisy domain.

Explore our many areas of focus

Building a collaborative ecosystem

Shaping the future together

Translating discovery into real-world impact

Video2Text: Learning to Annotate Video Content

Abstract

Meet the teams driving innovation

Google AI

Google Cloud

Google DeepMind

Google Labs