Sublinear Time Oracles for Hierarchical Clustering

Aidasadat Mousavifar; Akash Kumar; Michael Kapralov; Silvio Lattanzi

Sublinear Time Oracles for Hierarchical Clustering

Aidasadat Mousavifar

Akash Kumar

Michael Kapralov

Silvio Lattanzi

SODA 2021 (2023) (to appear)

Google Scholar

Abstract

Learning graph cluster structure using few queries is a classical question in property testing, with the fundamental special case, namely expansion testing, considered in the seminal work of Goldreich and Ron[STOC'96].
The most recent result in this line of work, due to Gluch et al.[SODA'21], designs {\em clustering oracles} for $(k, \e)$-clusterable graphs. These oracles, given a graph whose vertex set can be partitioned into a disjoint union of $k$ clusters (i.e., good expanders) with outer conductances bounded by $\e\ll 1$, provide query access to an $O(\e \log k)$-approximation to this ground truth clustering in time $\approx 2^{\text{poly}(k/\e)} n^{1/2+O(\e)}$ per query.

In this paper we show that it is possible to learn the {\em hierarchical} cluster structure of $(k, \e)$-clusterable graphs in sublinear time. First, we show how to simulate the hierarchical clustering algorithm of Charikar-Chatziafratis[SODA'17] to approximate the Dasgupta cost of a $k$-clusterable graph to within a factor of $O(\sqrt{\log k})$ in $\approx \text{poly}(k)\cdot n^{1/2+O(\e)}$ time assuming oracle access to the clustering. Second, we introduce a natural hierarchical model of clusterable graphs, and give a bona fide clustering oracle for this model, i.e. a small space data structure that can answer hierarchical clustering queries in $\approx\text{poly}(k) \cdot n^{1/2+O(\e)}$ time per query. Notably, in both cases the query time depends on polynomially on the number $k$ of clusters. The second result is the main technical contribution of the paper, and relies on several structural properties of hierarchically clusterable graphs that we hope will be of independent interest in sublinear time spectral graph algorithms.

Research Areas

Algorithms and theory

Explore our many areas of focus

Building a collaborative ecosystem

Shaping the future together

Translating discovery into real-world impact

Sublinear Time Oracles for Hierarchical Clustering

Abstract

Research Areas

Meet the teams driving innovation

Google AI

Google Cloud

Google DeepMind

Google Labs