Google Research

Large Scale computation of Means and Clusters for Persistence Diagrams using Optimal Transport

Advances in Neural Information Processing Systems 2018

Abstract

Persistence diagrams (PDs) are now routinely used to summarize the underlying topology of sophisticated data encountered in challenging learning problems. Despite several appealing properties, integrating PDs in learning pipelines can be challenging because their natural geometry is not Hilbertian. In particular, algorithms to average a family of PDs have only been considered recently and are known to be computationally prohibitive. We propose in this article a tractable framework to carry out fundamental tasks on PDs, namely evaluating distances, computing barycenters and carrying out clustering. This framework builds upon a formulation of PD metrics as optimal transport (OT) problems, for which recent computational advances, in particular entropic regularization and its convolutional formulation on regular grids, can all be leveraged to provide efficient and (GPU) scalable computations. We demonstrate the efficiency of our approach by carrying out clustering on PDs at scales never seen before in the literature.

Learn more about how we do research

We maintain a portfolio of research projects, providing individuals and teams the freedom to emphasize specific types of work