Google Research

Bottom-Up Hierarchical Inference for DP-Means

ICML (2020)

Abstract

DP-Means is a flexible objective for flat clustering which can adapt to different numbers of cluster present in the data (Broderick et al., 2013). It is derived from small variance asymptotic analysis of the non-parametric Dirichlet Process mixture models (Kulis & Jordan, 2012). DP-Means inference, however, often suffer from high sensitivity to initialization, similar to its parametric relative K-Means (Bachem et al., 2015). Bottom-up hierarchical clustering methods, which slice the tree structure to obtain a (flat) partition, have become an effective alternative K-Means inference method that alleviates initialization challenges (Großwendt et al., 2019). In this paper, we propose a hierarchical inference algorithm for DPmeans, drawing connections between several well studied methods such as agglomerative clustering, Bayesian hierarchical clustering, and facility location. When data is well separated, we show that the proposed method is a 2-approximation of the DP-Means objective. Further, our method excels experimentally on datasets of millions of points, producing high quality flat clustering solutions to DP-Means as well as tree structures.

Research Areas

Learn more about how we do research

We maintain a portfolio of research projects, providing individuals and teams the freedom to emphasize specific types of work