Nikos Parotsidis

Nikos Parotsidis

Nikos is a Research Scientist working on Machine Learning problems, Graph Problems, and their intersection.
Authored Publications
Sort By
  • Title
  • Title, descending
  • Year
  • Year, descending
    Preview abstract Correlation clustering is a central topic in unsupervised learning, with many applications in ML and data mining. In correlation clustering, one receives as input a signed graph and the goal is to partition it to minimize the number of disagreements. In this work we propose a massively parallel computation (MPC) algorithm for this problem that is considerably faster than prior work. In particular, our algorithm uses machines with memory sublinear in the number of nodes in the graph and returns a constant approximation while running only for a constant number of rounds. To the best of our knowledge, our algorithm is the first that can provably approximate a clustering problem using only a constant number of MPC rounds in the sublinear memory regime. We complement our analysis with an experimental scalability\nnote{I would remove "scalability": it is not clear that this will be demonstrated with mid-sized graphs} evaluation of our techniques. View details
    Preview abstract In the correlation clustering problem the input is a signed graph where the sign indicates whether each pair of points should be placed in the same cluster or not. The goal of the problem is to compute a clustering which minimizes the number of disagreements with such recommendation. Thanks to its many practical applications, correlation clustering is a fundamental unsupervised learning problem and has been extensively studied in many different settings. In this paper we study the problem in the classic online setting with recourse; The vertices of the graphs arrive in an online manner and the goal is to maintain an approximate clustering while minimizing the number of times each vertex changes cluster. Our main contribution is an algorithm that achieves logarithmic recourse per vertex in the worst case. We also complement this result with a tight lower bound. Finally we show experimentally that our algorithm achieves better performances than state-of-the-art algorithms on real world data. View details
    Near-Optimal Correlation Clustering with Privacy
    Ashkan Norouzi Fard
    Chenglin Fan
    Jakub Tarnawski
    Slobodan Mitrović
    NeurIPS 2022 (2022) (to appear)
    Preview abstract Correlation clustering is a central problem in unsupervised learning, with applications spanning community detection, duplicate detection, automated labeling and many more. In the correlation clustering problem one receives as input a set of nodes and for each node a list of co-clustering preferences, and the goal is to output a clustering that minimizes the disagreement with the specified nodes' preferences. In this paper, we introduce a simple and computationally efficient algorithm for the correlation clustering problem with provable privacy guarantees. Our additive error is stronger than the one shown in prior work and is optimal up to polylogarithmic factors for fixed privacy parameters. View details
    Preview abstract We consider the classic facility location problem in fully dynamic data streams, where elements can be both inserted and deleted. In this problem, one is interested in maintaining a stable and high quality solution throughout the data stream while using only little time per update (insertion or deletion). We study the problem and provide the first algorithm that at the same time maintains a constant approximation and only uses near-linear time per update and incurs in polylogarithmic recourse per update. We complement our theoretical results with an experimental analysis showing the practical efficiency of our method. View details
    Faster Algorithms for All-Pairs LCA in DAGs
    Aleksander Łukasiewicz
    Fabrizio Grandoni
    Giuseppe F. Italiano
    Przemysław Uznański
    SODA 2021 (to appear)
    Preview abstract Let G = (V, E) be an n-vertex directed acyclic graph (DAG). A lowest common ancestor (LCA) of two vertices u and v is a common ancestor w of u and v such that no descendant of w has the same property. In this paper, we consider the problem of computing an LCA, if any, for all pair of vertices in a DAG. The fastest known algorithms for this problem exploit fast matrix multiplication subroutines, and have running time ranging from O(n^{2.687}) [Bender et al. SODA’01] down to O(n^{2.615}) [Kowaluk and Lingas ICALP’05] and O(n^{2.569}) [Czumaj et al. TCS’07]. In this paper we make the first improvement of the running time in 13 years, namely we present a O(n^{2.447}) algorithm for this problem. Our key tool is a fast algorithm to partition the vertex set of the transitive closure of G into a collection of O(l) chains and O(n/l) antichains, for a given parameter l. As usual a chain is a path while an antichain is an independent set. We then find, for all pairs of vertices, a candidate LCA among the chain and antichain vertices, separately. The first set is obtained via a reduction to min-max matrix multiplication. The computation of the second set can be reduced to Boolean matrix multiplication similarly to previous results on this problem. We finally combine the two solutions together in a careful (non-obvious) manner. View details
    Planar Reachability Under Single Vertex or Edge Failures
    Adam Karczmarz
    Giuseppe F. Italiano
    SODA 2021 (to appear)
    Preview abstract In this paper we present an efficient reachability oracle under single-edge or single-vertex failures for planar directed graphs. Specifically, we show that a planar digraph G can be preprocessed in O(n log^2 n/log log n) time, producing an O(n log n)-space data structure that can answer in O(log n) time whether u can reach v in G if the vertex x (the edge f) is removed from G, for any query vertices u, v and failed vertex x (failed edge f). To the best of our knowledge, this is the first data structure for planar directed graphs with nearly-optimal preprocessing time that is capable of answering all-pairs queries under any kind of failures in polylogarithmic time. We also consider 2-reachability problems, where we are given a planar digraph G and we wish to determine if there are two vertex-disjoint (edge-disjoint) paths from u to v, for query vertices u, v. In this setting we provide a nearly-optimal 2-reachability oracle, which is the existential variant of the reachability oracle under single failures, with the following bounds. We can construct in O(n polylog n) time an O(n log^{3+o(1)} n)-space data structure that can check in O(log^{2+o(1)} n) time for any query vertices u, v whether v is 2-reachable from u, or otherwise find some separating vertex (edge) x lying on all paths from u to v in G. To obtain our results, we follow the general recursive approach of Thorup for reachability in planar graphs [J. ACM ’04] and we present new data structures which generalize dominator trees and previous data structures for strong-connectivity under failures [Georgiadis et al., SODA ’17]. Our new data structures work also for general digraphs and may be of independent interest. View details
    Preview abstract We consider the numerical taxonomy problem of fitting a positive distance function $\calD:{S\choose 2}\rightarrow \mathbb R_{>0}$ by a tree metric. We want a tree $T$ with positive edge weights and including $S$ among the vertices so that their distances in $T$ match those in $\calD$. A nice application is in evolutionary biology where the tree $T$ aims to approximate the branching process leading to the observed distances in $\calD$ [Cavalli-Sforza and Edwards 1967]. We consider the total error, that is the sum of distance errors over all pairs of points. We present a polynomial time algorithm minimizing the total error within a constant factor. We can do this both for general trees, and for the special case of ultrametrics with a root having the same distance to all vertices in $S$. The problems are known to be APX-hard, so a constant factor is the best we can hope for. The best previous approximation factor was $\calO((\log n)(\log \log n))$ by Ailon and Charikar [2005] who wrote ``Determining whether an $\calO(1)$ approximation can be obtained is a fascinating question". View details