Fitting Metrics and Ultrametrics with Minimum Disagreements
Abstract
Given n x n recording pairwise distances x, the Metric Violation Distance prob-
lem asks to compute the L_0 distance between x and the metric cone; i.e., modify the minimum
number of entries of x to make it a metric. Due to its large number of applications in various
data analysis and optimization tasks, this problem has been actively studied recently.
We present an Oplog nq-approximation algorithm for Metric Violation Distance, expo-
nentially improving the previous best approximation ratio of O(OPT ^{1/3}) of Fan, Raichel, and
Van Buskirk [SODA, 2018]. Furthermore, a major strength of our algorithm is its simplicity
and running time. We also study the related problem of Ultrametric Violation Distance,
where the goal is to compute the L0 distance to the cone of ultrametrics, and achieve a constant
factor approximation algorithm. The Ultrametric Violation Distance problem can be
regarded as an extension of the problem of fitting ultrametrics studied by Ailon and Charikar
[SIAM J. Computing, 2011] and by Cohen-Addad, Das, Kipouridis, Parotsidis, and Thorup
[FOCS, 2021] from L1 norm to L0 norm. We show that this problem can be favorably interpreted
as an instance of Correlation Clustering with an additional hierarchical structure, which
we solve using a new O(1)-approximation algorithm for correlation clustering that has the struc-
tural property that it outputs a refinement of the optimum clusters. An algorithm satisfying
such a property can be considered of independent interest. We also provide an O(log n log log n)
approximation algorithm for weighted instances. Finally, we investigate the complementary
version of these problems where one aims at choosing a maximum number of entries of x form-
ing an (ultra-)metric. In stark contrast with the minimization versions, we prove that these
maximization versions are hard to approximate within any constant factor assuming the Unique
Games Conjecture.
lem asks to compute the L_0 distance between x and the metric cone; i.e., modify the minimum
number of entries of x to make it a metric. Due to its large number of applications in various
data analysis and optimization tasks, this problem has been actively studied recently.
We present an Oplog nq-approximation algorithm for Metric Violation Distance, expo-
nentially improving the previous best approximation ratio of O(OPT ^{1/3}) of Fan, Raichel, and
Van Buskirk [SODA, 2018]. Furthermore, a major strength of our algorithm is its simplicity
and running time. We also study the related problem of Ultrametric Violation Distance,
where the goal is to compute the L0 distance to the cone of ultrametrics, and achieve a constant
factor approximation algorithm. The Ultrametric Violation Distance problem can be
regarded as an extension of the problem of fitting ultrametrics studied by Ailon and Charikar
[SIAM J. Computing, 2011] and by Cohen-Addad, Das, Kipouridis, Parotsidis, and Thorup
[FOCS, 2021] from L1 norm to L0 norm. We show that this problem can be favorably interpreted
as an instance of Correlation Clustering with an additional hierarchical structure, which
we solve using a new O(1)-approximation algorithm for correlation clustering that has the struc-
tural property that it outputs a refinement of the optimum clusters. An algorithm satisfying
such a property can be considered of independent interest. We also provide an O(log n log log n)
approximation algorithm for weighted instances. Finally, we investigate the complementary
version of these problems where one aims at choosing a maximum number of entries of x form-
ing an (ultra-)metric. In stark contrast with the minimization versions, we prove that these
maximization versions are hard to approximate within any constant factor assuming the Unique
Games Conjecture.