Content-based Related Video Recommendations
Abstract
This is a demo of related video recommendations, seeded from random YouTube videos, and based purely on video content signals. Traditional recommendation systems using collaborative filtering (CF) approaches suggest related videos for a given seed based on how many users have watched a particular candidate video right after watching the seed video. This does not take the video content into account but relies on aggregate user behavior. Traditional CF approaches work very well when the seed and the candidate videos are relatively popular – they must be watched in a sequence by many users in order for them to be identified as related by the CF system.
In this demo, we focus on the cold-start problem, where either the seed and/or the candidate video are freshly uploaded (or undiscovered) so the CF system cannot identify any related videos for them. Being able to recommend freshly uploaded videos as well as recommend good related videos for fresh video seeds are important for improving freshness and user engagement. We model this as a video content-based similarity learning problem, and learn deep video embeddings trained to predict ground-truth video relationships (identified by a CF co-watch-based system) but using only visual content. The system does not depend on availability on video metadata or any click information, and can generalize to both popular and tail content, as well as new video uploads. It embeds any new video into a 1024-dimensional representation based on its content and pairwise video similarity is computed simply as a dot product in the embedding space. We show that the learned video embeddings generalize beyond simple visual similarity and are able to capture complex semantic relationships.
In this demo, we focus on the cold-start problem, where either the seed and/or the candidate video are freshly uploaded (or undiscovered) so the CF system cannot identify any related videos for them. Being able to recommend freshly uploaded videos as well as recommend good related videos for fresh video seeds are important for improving freshness and user engagement. We model this as a video content-based similarity learning problem, and learn deep video embeddings trained to predict ground-truth video relationships (identified by a CF co-watch-based system) but using only visual content. The system does not depend on availability on video metadata or any click information, and can generalize to both popular and tail content, as well as new video uploads. It embeds any new video into a 1024-dimensional representation based on its content and pairwise video similarity is computed simply as a dot product in the embedding space. We show that the learned video embeddings generalize beyond simple visual similarity and are able to capture complex semantic relationships.