Nikos Parotsidis
Nikos is a Research Scientist working on Machine Learning problems, Graph Problems, and their intersection.
Research Areas
Authored Publications
Sort By
Correlation Clustering in Constant Many Parallel Rounds
Ashkan Norouzi Fard
Jakub Tarnawski
Slobodan Mitrović
ICML (2022) (to appear)
Preview abstract
Correlation clustering is a central topic in unsupervised learning, with many applications in ML and data mining. In correlation clustering, one receives as input a signed graph and the goal is to partition it to minimize the number of disagreements. In this work we propose a massively parallel computation (MPC) algorithm for this problem that is considerably faster than prior work. In particular, our algorithm uses machines with memory sublinear in the number of nodes in the graph and returns a constant approximation while running only for a constant number of rounds. To the best of our knowledge, our algorithm is the first that can provably approximate a clustering problem using only a constant number of MPC rounds in the sublinear memory regime. We complement our analysis with an experimental scalability\nnote{I would remove "scalability": it is not clear that this will be demonstrated with mid-sized graphs} evaluation of our techniques.
View details
Preview abstract
In the correlation clustering problem the input is a signed graph where the sign indicates whether each pair of points should be placed in the same cluster or not. The goal of the problem is to compute a clustering which minimizes the number of disagreements with such recommendation. Thanks to its many practical applications, correlation clustering is a fundamental unsupervised learning problem and has been extensively studied in many different settings. In this paper we study the problem in the classic online setting with recourse; The vertices of the graphs arrive in an online manner and the goal is to maintain an approximate clustering while minimizing the number of times each vertex changes cluster.
Our main contribution is an algorithm that achieves logarithmic recourse per vertex in the worst case. We also complement this result with a tight lower bound. Finally we show experimentally that our algorithm achieves better performances than state-of-the-art algorithms on real world data.
View details
Near-Optimal Correlation Clustering with Privacy
Ashkan Norouzi Fard
Chenglin Fan
Jakub Tarnawski
Slobodan Mitrović
NeurIPS 2022 (2022) (to appear)
Preview abstract
Correlation clustering is a central problem in unsupervised learning, with applications spanning community detection, duplicate detection, automated labeling and many more. In the correlation clustering problem one receives as input a set of nodes and for each node a list of co-clustering preferences, and the goal is to output a clustering that minimizes the disagreement with the specified nodes' preferences. In this paper, we introduce a simple and computationally efficient algorithm for the correlation clustering problem with provable privacy guarantees. Our additive error is stronger than the one shown in prior work and is optimal up to polylogarithmic factors for fixed privacy parameters.
View details
Preview abstract
We consider the classic facility location problem in fully dynamic data streams, where elements can be both inserted and deleted. In this problem, one is interested in maintaining a stable and high quality solution throughout the data stream while using only little time per update (insertion or deletion). We study the problem and provide the first algorithm that at the same time maintains a constant approximation and only uses near-linear time per update and incurs in polylogarithmic recourse per update. We complement our theoretical results with an experimental analysis showing the practical efficiency of our method.
View details
Faster Algorithms for All-Pairs LCA in DAGs
Aleksander Łukasiewicz
Fabrizio Grandoni
Giuseppe F. Italiano
Przemysław Uznański
SODA 2021 (to appear)
Preview abstract
Let G = (V, E) be an n-vertex directed acyclic graph (DAG). A lowest common ancestor
(LCA) of two vertices u and v is a common ancestor w of u and v such that no descendant
of w has the same property. In this paper, we consider the problem of computing an LCA,
if any, for all pair of vertices in a DAG. The fastest known algorithms for this problem
exploit fast matrix multiplication subroutines, and have running time ranging from O(n^{2.687})
[Bender et al. SODA’01] down to O(n^{2.615}) [Kowaluk and Lingas ICALP’05] and O(n^{2.569})
[Czumaj et al. TCS’07].
In this paper we make the first improvement of the running time in 13 years, namely we
present a O(n^{2.447}) algorithm for this problem. Our key tool is a fast algorithm to partition
the vertex set of the transitive closure of G into a collection of O(l) chains and O(n/l)
antichains, for a given parameter l. As usual a chain is a path while an antichain is an
independent set. We then find, for all pairs of vertices, a candidate LCA among the chain
and antichain vertices, separately. The first set is obtained via a reduction to min-max
matrix multiplication. The computation of the second set can be reduced to Boolean matrix
multiplication similarly to previous results on this problem. We finally combine the two
solutions together in a careful (non-obvious) manner.
View details
Preview abstract
In this paper we present an efficient reachability oracle under single-edge or single-vertex
failures for planar directed graphs. Specifically, we show that a planar digraph G can be
preprocessed in O(n log^2 n/log log n) time, producing an O(n log n)-space data structure that can
answer in O(log n) time whether u can reach v in G if the vertex x (the edge f) is removed from
G, for any query vertices u, v and failed vertex x (failed edge f). To the best of our knowledge,
this is the first data structure for planar directed graphs with nearly-optimal preprocessing time
that is capable of answering all-pairs queries under any kind of failures in polylogarithmic time.
We also consider 2-reachability problems, where we are given a planar digraph G and we
wish to determine if there are two vertex-disjoint (edge-disjoint) paths from u to v, for query
vertices u, v. In this setting we provide a nearly-optimal 2-reachability oracle, which is the
existential variant of the reachability oracle under single failures, with the following bounds. We
can construct in O(n polylog n) time an O(n log^{3+o(1)} n)-space data structure that can check in
O(log^{2+o(1)} n) time for any query vertices u, v whether v is 2-reachable from u, or otherwise find
some separating vertex (edge) x lying on all paths from u to v in G.
To obtain our results, we follow the general recursive approach of Thorup for reachability in
planar graphs [J. ACM ’04] and we present new data structures which generalize dominator trees
and previous data structures for strong-connectivity under failures [Georgiadis et al., SODA ’17].
Our new data structures work also for general digraphs and may be of independent interest.
View details
Fitting Distances by Tree Metrics Minimizing the Total Error within a Constant Factor
Debarati Das
Evangelos Kipouridis
Mikkel Thorup
FOCS 2021
Preview abstract
We consider the numerical taxonomy problem of fitting a positive distance function $\calD:{S\choose 2}\rightarrow \mathbb R_{>0}$ by a tree metric. We want a tree $T$ with positive edge weights and including $S$ among the vertices so that their distances in $T$ match
those in $\calD$. A nice application is in evolutionary biology
where the tree $T$ aims to approximate the branching process leading
to the observed distances in $\calD$ [Cavalli-Sforza and Edwards 1967].
We consider the total error, that is the sum of distance errors over
all pairs of points. We present a polynomial time algorithm
minimizing the total error within a constant factor. We can do this
both for general trees, and for the special case of ultrametrics with
a root having the same distance to all vertices in $S$.
The problems are known to be APX-hard, so a constant factor is the best we can hope for.
The best previous approximation factor was
$\calO((\log n)(\log \log n))$ by Ailon and Charikar [2005] who wrote ``Determining whether an
$\calO(1)$ approximation can be obtained is a fascinating question".
View details