Google Research

Exact Recovery of Clusters in Finite Metric Spacesusing Oracle Queries

COLT 2021 (to appear)

Abstract

We investigate the problem of exact cluster recovery using oracle queries. In Euclidean spaces, clusters that are convexand separated with a margin can be reconstructed exactly using only $\scO(\log n)$ same-cluster queries, where $n$ is the number of input points. In this work, we study the problem in the more challenging non-convex setting. In particular, we introduce a structural characterization of clusters, called $(\beta,\gamma)$-density, that can be applied to any finite set of points equipped with a metric (or even a semimetric, as the triangle inequality is not needed). Using $(\beta,\gamma)$-density, we can translate natural density properties of clusters (which include, for instance, clusters that strongly non-convex in $\R^d$) into a graph-theoretic notion of convexity. By exploiting this convexity notion, we design a deterministic algorithm that recovers $(\beta,\gamma)$-dense clusters using roughly $\scO(k^2 \log n + k^2 \PackNum^)$ same-cluster queries, where $k$ is the number of clusters and $\PackNum^$ is an upper bound to the packing number of the (semi)metric. We show that the dependence on the packing number is necessary, and we also show that, if we are allowed to make $\scO(k^2)$ additional queries to a ``cluster separation'' oracle, then we can recover clusters that have different and arbitrary density scales, even when the scale of each cluster is unknown.

Learn more about how we do research

We maintain a portfolio of research projects, providing individuals and teams the freedom to emphasize specific types of work