GIST: Greedy Independent Set Thresholding for Max-Min Diversification with Submodular Utility

Gui Citovsky
The Thirty-ninth Annual Conference on Neural Information Processing Systems (2025)

Abstract

We introduce a novel subset selection problem called \emph{min-distance diversification with monotone submodular utility} (MDMS), which has a wide variety of applications in machine learning, e.g., data sampling and feature selection.
Given a set of points in a metric space, the goal of MDMS is to maximize an objective function combining a monotone submodular utility term and a min-distance diversity term between any pair of selected points, subject to a cardinality constraint.
We propose the $\GIST$ algorithm, which achieves a $\sfrac{1}{2}$-approximation guarantee for MDMS by approximating a series of maximum independent set problems with a bicriteria greedy algorithm.
We also prove that it is NP-hard to approximate to within a factor of $0.5584$.
Finally, we demonstrate that $\GIST$ outperforms existing benchmarks for on a real-world image classification task that studies single-shot subset selection for ImageNet.
×