Improved Coresets for Euclidean k-Means

Chris Schwiegelshohn

David Saulpic

Kasper Green Larsen

Vincent Pierre Cohen-addad

Omar Ali Sheikh-Omar

Neurips'22 (2022)

Google Scholar

Abstract

Given a set of n points in d dimensions, the Euclidean k-means problem consists of finding k centers such that the sum of squared distances from every point to its closest center is minimized. The arguably most popular way of dealing with this problem in the big data setting is to first compress the data by computing a weighted subset known as a coreset and then run any algorithm on this subset. The guarantee of the coreset is that for any candidate solution, the ratio between coreset cost and the cost of the original instance is less than a (1+epsilon) factor. The current state of the art coreset size for Euclidean k-means is O(k epsilon^{-2} min(k, epsilon^{-2})). This matches the lower bound of Omega(k epsilon^{-2}) up to the min(k, epsilon^{-2}) term. In this paper, we improve this bound to min(sqrt(k), epsilon^{-2}). In the regime where k = o(epsilon^{-2}), this is a strict improvement over the state of the art. In particular, ours is the first provable bound that breaks through the k^2 barrier while retaining an optimal dependency on epsilon.

Defining the technology of today and tomorrow.

Philosophy

People

Teams

AI/ML Foundations  & Capabilities

Algorithms & Optimization

Computing Paradigms

Responsible Human-Centric Technology

Science & Societal Impact

Projects

Publications

Resources

Shaping the future, together.

Student programs

Faculty programs

Conferences & events

Improved Coresets for Euclidean k-Means

Abstract

Research Areas

Meet the teams driving innovation

Defining the technology of today and tomorrow.

Philosophy

People

Teams

AI/ML Foundations & Capabilities

Algorithms & Optimization

Computing Paradigms

Responsible Human-Centric Technology

Science & Societal Impact

Projects

Publications

Resources

Shaping the future, together.

Student programs

Faculty programs

Conferences & events

Improved Coresets for Euclidean k-Means

Abstract

Research Areas

Meet the teams driving innovation

AI/ML Foundations  & Capabilities