A Simple and Efficient Method to Handle Sparse Preference Data Using Domination Graphs: An Application to YouTube
Abstract
The phenomenal growth of the number of videos on YouTube provides enormous potential
for users to find content of interest to them. Unfortunately, as the size of the repository
grows, the task of discovering high-quality content becomes more daunting. To address this,
YouTube occasionally asks users for feedback on videos. In one such event (the YouTube
Comedy Slam), users were asked to rate which of two videos was funnier. This yielded sparse
pairwise data indicating a participant’s relative assessment of two videos. Given this data,
several questions immediately arise: how do we make inferences for uncompared pairs, overcome
noisy, and usually contradictory, data, and how do we handle severely skewed, real-world,
sampling? To address these questions, we introduce the concept of a domination-graph, and
demonstrate a simple and scalable method, based on the Adsorption algorithm, to efficiently
propagate preferences through the graph. Before tackling the publicly available YouTube data,
we extensively test our approach on synthetic data by attempting to recover an underlying,
known, rank-order of videos using similarly created sparse preference data.
for users to find content of interest to them. Unfortunately, as the size of the repository
grows, the task of discovering high-quality content becomes more daunting. To address this,
YouTube occasionally asks users for feedback on videos. In one such event (the YouTube
Comedy Slam), users were asked to rate which of two videos was funnier. This yielded sparse
pairwise data indicating a participant’s relative assessment of two videos. Given this data,
several questions immediately arise: how do we make inferences for uncompared pairs, overcome
noisy, and usually contradictory, data, and how do we handle severely skewed, real-world,
sampling? To address these questions, we introduce the concept of a domination-graph, and
demonstrate a simple and scalable method, based on the Adsorption algorithm, to efficiently
propagate preferences through the graph. Before tackling the publicly available YouTube data,
we extensively test our approach on synthetic data by attempting to recover an underlying,
known, rank-order of videos using similarly created sparse preference data.