Linear Relaxations for Finding Diverse Elements in Metric Spaces
Abstract
Choosing a diverse subset of a large collection of points in a metric space is a fundamental
problem, with applications in feature selection, recommender systems,
web search, data summarization, etc. Various notions of diversity have been proposed,
tailored to different applications. The general algorithmic goal is to find
a subset of points that maximize diversity, while obeying a cardinality (or more
generally, matroid) constraint. The goal of this paper is to develop a novel linear
programming (LP) framework that allows us to design approximation algorithms
for such problems. We study an objective known as sum-min diversity, which
is known to be effective in many applications, and give the first constant factor
approximation algorithm. Our LP framework allows us to easily incorporate additional
constraints, as well as secondary objectives. We also prove a hardness result
for two natural diversity objectives, under the so-called planted clique assumption.
Finally, we study the empirical performance of our algorithm on several standard
datasets. We first study the approximation quality of the algorithm by comparing
with the LP objective. Then, we compare the quality of the solutions produced by
our method with other popular diversity maximization algorithms.
problem, with applications in feature selection, recommender systems,
web search, data summarization, etc. Various notions of diversity have been proposed,
tailored to different applications. The general algorithmic goal is to find
a subset of points that maximize diversity, while obeying a cardinality (or more
generally, matroid) constraint. The goal of this paper is to develop a novel linear
programming (LP) framework that allows us to design approximation algorithms
for such problems. We study an objective known as sum-min diversity, which
is known to be effective in many applications, and give the first constant factor
approximation algorithm. Our LP framework allows us to easily incorporate additional
constraints, as well as secondary objectives. We also prove a hardness result
for two natural diversity objectives, under the so-called planted clique assumption.
Finally, we study the empirical performance of our algorithm on several standard
datasets. We first study the approximation quality of the algorithm by comparing
with the LP objective. Then, we compare the quality of the solutions produced by
our method with other popular diversity maximization algorithms.