Nikos Parotsidis
Nikos is a Research Scientist working on Machine Learning problems, Graph Problems, and their intersection.
Research Areas
Authored Publications
Sort By
Streaming Trends: A Low-Latency Platform for Dynamic Video Grouping and Trending Corpora Building
Scott Wang
Caroline Zhou
Ashkan Norouzi Fard
Yaping Zhang
Li Zhang
Mingyan Gao
Qiao Zhang
2025
Preview abstract
This paper presents Streaming Trends, a real-time system deployed on a short-form videos platform that enables dynamic content grouping, tracking videos from upload to their identification as part of a trend. Addressing the latency inherent in traditional batch processing for short-form video, Streaming Trends utilizes online clustering and flexible similarity measures to associate new uploads with relevant groups in near real-time. The system combines online processing for immediate updates triggered by uploads and seed queries with offline processes for similarity modeling and cluster quality maintenance. By facilitating the rapid identification and association of trending videos, Streaming Trends significantly enhances content discovery and user engagement on the YouTube Shorts platform.
View details
Almost Optimal Fully Dynamic k-Center Clustering with Recourse
Sayan Bhattacharya
Martin Costa
Ermiya Farokhnejad
2025 International Conference on Machine Learning (2025)
Preview abstract
In this paper, we consider the \emph{metric $k$-center} problem in the fully dynamic setting, where we are given a metric space $(V,d)$ evolving via a sequence of point insertions and deletions and our task is to maintain a subset $S \subseteq V$ of at most $k$ points that minimizes the objective $\max_{x \in V} \min_{y \in S}d(x, y)$. We want to design our algorithm so that we minimize its \emph{approximation ratio}, \emph{recourse} (the number of changes it makes to the solution $S$) and \emph{update time} (the time it takes to handle an update).
We give a simple algorithm for dynamic $k$-center that maintains a $O(1)$-approximate solution with $O(1)$ amortized recourse and $\tilde O(k)$ amortized update time, \emph{obtaining near-optimal approximation, recourse and update time simultaneously}. We obtain our result by combining a variant of the dynamic $k$-center algorithm of Bateni et al.~[SODA'23] with the dynamic sparsifier of Bhattacharya et al.~[NeurIPS'23].
View details
Fully Dynamic k-Clustering with Fast Update Time and Small Recourse
Sayan Bhattacharya
Martin Costa
Naveen Garg
2024 IEEE 65th Annual Symposium on Foundations of Computer Science (FOCS) (2024)
Preview abstract
In the dynamic metric $k$-median problem, we wish to maintain a set of $k$ centers $S \subseteq V$ in an input metric space $(V, d)$ that gets updated via point insertions/deletions, so as to minimize the objective $\sum_{x \in V} \min_{y \in S} d(x, y)$. The quality of a dynamic algorithm is measured in terms of its approximation ratio, ``recourse'' (the number of changes in $S$ per update) and ``update time'' (the time it takes to handle an update). The ultimate goal in this line of research is to obtain a dynamic $O(1)$ approximation algorithm with $\tilde{O}(1)$ recourse and $\tilde{O}(k)$ update time.
Dynamic $k$-median is a canonical example of a class of problems known as dynamic $k$-clustering, that has received significant attention in recent years [Fichtenberger et al, SODA'21], [Bateni et al, SODA'23], [Lacki et al, SODA'24]. To the best of our knowledge, however, all these previous papers either attempt to minimize the algorithm's recourse while ignoring its update time, or minimize the algorithm's update time while ignoring its recourse. For dynamic $k$-median in particular, the state-of-the-art results get $\tilde{O}(k^2)$ update time and $O(k)$ recourse~[Cohen-Addad et al, ICML'19], [Henzinger and Kale, ESA'20], [Bhattacharya et al, NeurIPS'23]. But, this recourse bound of $O(k)$ can be trivially obtained by recomputing an optimal solution from scratch after every update, provided we ignore the update time. In addition, the update time of $\tilde{O}(k^2)$ is polynomially far away from the desired bound of $\tilde{O}(k)$. We come {\em arbitrarily close} to resolving the main open question on this topic, with the following results.
(I) We develop a new framework of {\em randomized local search} that is suitable for adaptation in a dynamic setting. For every $\epsilon > 0$, this gives us a dynamic $k$-median algorithm with $O(1/\epsilon)$ approximation ratio, $\tilde{O}(k^{\epsilon})$ recourse and $\tilde{O}(k^{1+\epsilon})$ update time. This framework also generalizes to dynamic $k$-clustering with $\ell^p$-norm objectives. As a corollary, we obtain similar bounds for the dynamic $k$-means problem, and a new trade-off between approximation ratio, recourse and update time for the dynamic $k$-center problem.
(II) If it suffices to maintain only an estimate of the {\em value} of the optimal $k$-median objective, then we obtain a $O(1)$ approximation algorithm with $\tilde{O}(k)$ update time. We achieve this result via adapting the Lagrangian Relaxation framework of [Jain and Vazirani, JACM'01], and a facility location algorithm of [Mettu and Plaxton, FOCS'00] in the dynamic setting.
View details
Preview abstract
We consider the classic facility location problem in fully dynamic data streams, where elements can be both inserted and deleted. In this problem, one is interested in maintaining a stable and high quality solution throughout the data stream while using only little time per update (insertion or deletion). We study the problem and provide the first algorithm that at the same time maintains a constant approximation and only uses near-linear time per update and incurs in polylogarithmic recourse per update. We complement our theoretical results with an experimental analysis showing the practical efficiency of our method.
View details
Correlation Clustering in Constant Many Parallel Rounds
Ashkan Norouzi Fard
Jakub Tarnawski
Slobodan Mitrović
ICML (2022) (to appear)
Preview abstract
Correlation clustering is a central topic in unsupervised learning, with many applications in ML and data mining. In correlation clustering, one receives as input a signed graph and the goal is to partition it to minimize the number of disagreements. In this work we propose a massively parallel computation (MPC) algorithm for this problem that is considerably faster than prior work. In particular, our algorithm uses machines with memory sublinear in the number of nodes in the graph and returns a constant approximation while running only for a constant number of rounds. To the best of our knowledge, our algorithm is the first that can provably approximate a clustering problem using only a constant number of MPC rounds in the sublinear memory regime. We complement our analysis with an experimental scalability\nnote{I would remove "scalability": it is not clear that this will be demonstrated with mid-sized graphs} evaluation of our techniques.
View details
Near-Optimal Correlation Clustering with Privacy
Ashkan Norouzi Fard
Chenglin Fan
Jakub Tarnawski
Slobodan Mitrović
NeurIPS 2022 (2022) (to appear)
Preview abstract
Correlation clustering is a central problem in unsupervised learning, with applications spanning community detection, duplicate detection, automated labeling and many more. In the correlation clustering problem one receives as input a set of nodes and for each node a list of co-clustering preferences, and the goal is to output a clustering that minimizes the disagreement with the specified nodes' preferences. In this paper, we introduce a simple and computationally efficient algorithm for the correlation clustering problem with provable privacy guarantees. Our additive error is stronger than the one shown in prior work and is optimal up to polylogarithmic factors for fixed privacy parameters.
View details
Preview abstract
In the correlation clustering problem the input is a signed graph where the sign indicates whether each pair of points should be placed in the same cluster or not. The goal of the problem is to compute a clustering which minimizes the number of disagreements with such recommendation. Thanks to its many practical applications, correlation clustering is a fundamental unsupervised learning problem and has been extensively studied in many different settings. In this paper we study the problem in the classic online setting with recourse; The vertices of the graphs arrive in an online manner and the goal is to maintain an approximate clustering while minimizing the number of times each vertex changes cluster.
Our main contribution is an algorithm that achieves logarithmic recourse per vertex in the worst case. We also complement this result with a tight lower bound. Finally we show experimentally that our algorithm achieves better performances than state-of-the-art algorithms on real world data.
View details
Fitting Distances by Tree Metrics Minimizing the Total Error within a Constant Factor
Debarati Das
Evangelos Kipouridis
Mikkel Thorup
FOCS 2021
Preview abstract
We consider the numerical taxonomy problem of fitting a positive distance function $\calD:{S\choose 2}\rightarrow \mathbb R_{>0}$ by a tree metric. We want a tree $T$ with positive edge weights and including $S$ among the vertices so that their distances in $T$ match
those in $\calD$. A nice application is in evolutionary biology
where the tree $T$ aims to approximate the branching process leading
to the observed distances in $\calD$ [Cavalli-Sforza and Edwards 1967].
We consider the total error, that is the sum of distance errors over
all pairs of points. We present a polynomial time algorithm
minimizing the total error within a constant factor. We can do this
both for general trees, and for the special case of ultrametrics with
a root having the same distance to all vertices in $S$.
The problems are known to be APX-hard, so a constant factor is the best we can hope for.
The best previous approximation factor was
$\calO((\log n)(\log \log n))$ by Ailon and Charikar [2005] who wrote ``Determining whether an
$\calO(1)$ approximation can be obtained is a fascinating question".
View details
Preview abstract
In this paper we present an efficient reachability oracle under single-edge or single-vertex
failures for planar directed graphs. Specifically, we show that a planar digraph G can be
preprocessed in O(n log^2 n/log log n) time, producing an O(n log n)-space data structure that can
answer in O(log n) time whether u can reach v in G if the vertex x (the edge f) is removed from
G, for any query vertices u, v and failed vertex x (failed edge f). To the best of our knowledge,
this is the first data structure for planar directed graphs with nearly-optimal preprocessing time
that is capable of answering all-pairs queries under any kind of failures in polylogarithmic time.
We also consider 2-reachability problems, where we are given a planar digraph G and we
wish to determine if there are two vertex-disjoint (edge-disjoint) paths from u to v, for query
vertices u, v. In this setting we provide a nearly-optimal 2-reachability oracle, which is the
existential variant of the reachability oracle under single failures, with the following bounds. We
can construct in O(n polylog n) time an O(n log^{3+o(1)} n)-space data structure that can check in
O(log^{2+o(1)} n) time for any query vertices u, v whether v is 2-reachable from u, or otherwise find
some separating vertex (edge) x lying on all paths from u to v in G.
To obtain our results, we follow the general recursive approach of Thorup for reachability in
planar graphs [J. ACM ’04] and we present new data structures which generalize dominator trees
and previous data structures for strong-connectivity under failures [Georgiadis et al., SODA ’17].
Our new data structures work also for general digraphs and may be of independent interest.
View details
Faster Algorithms for All-Pairs LCA in DAGs
Aleksander Łukasiewicz
Fabrizio Grandoni
Giuseppe F. Italiano
Przemysław Uznański
SODA 2021 (to appear)
Preview abstract
Let G = (V, E) be an n-vertex directed acyclic graph (DAG). A lowest common ancestor
(LCA) of two vertices u and v is a common ancestor w of u and v such that no descendant
of w has the same property. In this paper, we consider the problem of computing an LCA,
if any, for all pair of vertices in a DAG. The fastest known algorithms for this problem
exploit fast matrix multiplication subroutines, and have running time ranging from O(n^{2.687})
[Bender et al. SODA’01] down to O(n^{2.615}) [Kowaluk and Lingas ICALP’05] and O(n^{2.569})
[Czumaj et al. TCS’07].
In this paper we make the first improvement of the running time in 13 years, namely we
present a O(n^{2.447}) algorithm for this problem. Our key tool is a fast algorithm to partition
the vertex set of the transitive closure of G into a collection of O(l) chains and O(n/l)
antichains, for a given parameter l. As usual a chain is a path while an antichain is an
independent set. We then find, for all pairs of vertices, a candidate LCA among the chain
and antichain vertices, separately. The first set is obtained via a reduction to min-max
matrix multiplication. The computation of the second set can be reduced to Boolean matrix
multiplication similarly to previous results on this problem. We finally combine the two
solutions together in a careful (non-obvious) manner.
View details