# Nikos Parotsidis

Nikos is a Research Scientist working on Machine Learning problems, Graph Problems, and their intersection.

### Research Areas

Authored Publications

Sort By

Correlation Clustering in Constant Many Parallel Rounds

Ashkan Norouzi Fard

Jakub Tarnawski

Slobodan Mitrović

ICML (2022) (to appear)

Preview abstract
Correlation clustering is a central topic in unsupervised learning, with many applications in ML and data mining. In correlation clustering, one receives as input a signed graph and the goal is to partition it to minimize the number of disagreements. In this work we propose a massively parallel computation (MPC) algorithm for this problem that is considerably faster than prior work. In particular, our algorithm uses machines with memory sublinear in the number of nodes in the graph and returns a constant approximation while running only for a constant number of rounds. To the best of our knowledge, our algorithm is the first that can provably approximate a clustering problem using only a constant number of MPC rounds in the sublinear memory regime. We complement our analysis with an experimental scalability\nnote{I would remove "scalability": it is not clear that this will be demonstrated with mid-sized graphs} evaluation of our techniques.
View details

Near-Optimal Correlation Clustering with Privacy

Ashkan Norouzi Fard

Chenglin Fan

Jakub Tarnawski

Slobodan Mitrović

NeurIPS 2022 (2022) (to appear)

Preview abstract
Correlation clustering is a central problem in unsupervised learning, with applications spanning community detection, duplicate detection, automated labeling and many more. In the correlation clustering problem one receives as input a set of nodes and for each node a list of co-clustering preferences, and the goal is to output a clustering that minimizes the disagreement with the specified nodes' preferences. In this paper, we introduce a simple and computationally efficient algorithm for the correlation clustering problem with provable privacy guarantees. Our additive error is stronger than the one shown in prior work and is optimal up to polylogarithmic factors for fixed privacy parameters.
View details

Fitting Distances by Tree Metrics Minimizing the Total Error within a Constant Factor

Debarati Das

Evangelos Kipouridis

Mikkel Thorup

FOCS 2021

Preview abstract
We consider the numerical taxonomy problem of fitting a positive distance function $\calD:{S\choose 2}\rightarrow \mathbb R_{>0}$ by a tree metric. We want a tree $T$ with positive edge weights and including $S$ among the vertices so that their distances in $T$ match
those in $\calD$. A nice application is in evolutionary biology
where the tree $T$ aims to approximate the branching process leading
to the observed distances in $\calD$ [Cavalli-Sforza and Edwards 1967].
We consider the total error, that is the sum of distance errors over
all pairs of points. We present a polynomial time algorithm
minimizing the total error within a constant factor. We can do this
both for general trees, and for the special case of ultrametrics with
a root having the same distance to all vertices in $S$.
The problems are known to be APX-hard, so a constant factor is the best we can hope for.
The best previous approximation factor was
$\calO((\log n)(\log \log n))$ by Ailon and Charikar [2005] who wrote ``Determining whether an
$\calO(1)$ approximation can be obtained is a fascinating question".
View details

Faster Algorithms for All-Pairs LCA in DAGs

Aleksander Łukasiewicz

Fabrizio Grandoni

Giuseppe F. Italiano

Przemysław Uznański

SODA 2021 (to appear)

Preview abstract
Let G = (V, E) be an n-vertex directed acyclic graph (DAG). A lowest common ancestor
(LCA) of two vertices u and v is a common ancestor w of u and v such that no descendant
of w has the same property. In this paper, we consider the problem of computing an LCA,
if any, for all pair of vertices in a DAG. The fastest known algorithms for this problem
exploit fast matrix multiplication subroutines, and have running time ranging from O(n^{2.687})
[Bender et al. SODA’01] down to O(n^{2.615}) [Kowaluk and Lingas ICALP’05] and O(n^{2.569})
[Czumaj et al. TCS’07].
In this paper we make the first improvement of the running time in 13 years, namely we
present a O(n^{2.447}) algorithm for this problem. Our key tool is a fast algorithm to partition
the vertex set of the transitive closure of G into a collection of O(l) chains and O(n/l)
antichains, for a given parameter l. As usual a chain is a path while an antichain is an
independent set. We then find, for all pairs of vertices, a candidate LCA among the chain
and antichain vertices, separately. The first set is obtained via a reduction to min-max
matrix multiplication. The computation of the second set can be reduced to Boolean matrix
multiplication similarly to previous results on this problem. We finally combine the two
solutions together in a careful (non-obvious) manner.
View details