Jump to Content

Fitting Metrics and Ultrametrics with Minimum Disagreements

Arnaud de Mesmay
Chenglin Fan
Euiwoong Lee
FOCS'22 (2022) (to appear)
Google Scholar

Abstract

Given n x n recording pairwise distances x, the Metric Violation Distance prob- lem asks to compute the L_0 distance between x and the metric cone; i.e., modify the minimum number of entries of x to make it a metric. Due to its large number of applications in various data analysis and optimization tasks, this problem has been actively studied recently. We present an Oplog nq-approximation algorithm for Metric Violation Distance, expo- nentially improving the previous best approximation ratio of O(OPT ^{1/3}) of Fan, Raichel, and Van Buskirk [SODA, 2018]. Furthermore, a major strength of our algorithm is its simplicity and running time. We also study the related problem of Ultrametric Violation Distance, where the goal is to compute the L0 distance to the cone of ultrametrics, and achieve a constant factor approximation algorithm. The Ultrametric Violation Distance problem can be regarded as an extension of the problem of fitting ultrametrics studied by Ailon and Charikar [SIAM J. Computing, 2011] and by Cohen-Addad, Das, Kipouridis, Parotsidis, and Thorup [FOCS, 2021] from L1 norm to L0 norm. We show that this problem can be favorably interpreted as an instance of Correlation Clustering with an additional hierarchical structure, which we solve using a new O(1)-approximation algorithm for correlation clustering that has the struc- tural property that it outputs a refinement of the optimum clusters. An algorithm satisfying such a property can be considered of independent interest. We also provide an O(log n log log n) approximation algorithm for weighted instances. Finally, we investigate the complementary version of these problems where one aims at choosing a maximum number of entries of x form- ing an (ultra-)metric. In stark contrast with the minimization versions, we prove that these maximization versions are hard to approximate within any constant factor assuming the Unique Games Conjecture.