# Alessandro Epasto

I am a staff research scientist at Google, New York working in the Algorithms and Optimization team lead by Vahab Mirrokni.
I received a Ph.D in computer science from Sapienza University of Rome, where I was advised by Professor Alessandro Panconesi and supported by the Google Europe Ph.D. Fellowship in Algorithms, 2011.
I was also a post-doc at the department of computer science of Brown University in Providence (RI), USA where I was advised by Professor Eli Upfal.
My research interests include algorithmic problems in machine learning and data mining, in particular in the areas of clustering, privacy and large scale graphs analysis.

Authored Publications

Google Publications

Other Publications

Sort By

Scalable Differentially Private Clustering via Hierarchically Separated Trees

Chris Schwiegelshohn

David Saulpic

2022 ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2022) (to appear)

Preview abstract
We study the private $k$-median and $k$-means clustering problem in $d$ dimensional Euclidean space.
By leveraging tree embeddings, we give an efficient and easy to implement algorithm, that is empirically competitive with state of the art non private methods.
We prove that our method computes a solution with cost at most $O(d^{3/2}\log n)\cdot OPT + O(k d^2 \log^2 n / \epsilon^2)$, where $\epsilon$ is the privacy guarantee. (The dimension term, $d$, can be replaced with $O(\log k)$ using standard dimension reduction techniques.) Although the worst-case guarantee is worse than that of state of the art private clustering methods, the algorithm we propose is practical, runs in near-linear, $\tilde{O}(nkd)$, time and scales to tens of millions of points. We also show that our method is amenable to parallelization in large-scale distributed computing environments. In particular we show that our private algorithms can be implemented in logarithmic number of MPC rounds in the sublinear memory regime.
Finally, we complement our theoretical analysis with an empirical evaluation demonstrating the algorithm's efficiency and accuracy in comparison to other privacy clustering baselines.
View details

Differentially Private Graph Learning via Sensitivity-Bounded Personalized PageRank

Advances in Neural Information Processing Systems (2022)

Preview abstract
Personalized PageRank (PPR) is a fundamental tool in unsupervised learning of graph representations such as node ranking, labeling, and graph embedding. However, while data privacy is one of the most important recent concerns, existing PPR algorithms are not designed to protect user privacy. PPR is highly sensitive to the input graph edges: the difference of only one edge may cause a big change in the PPR vector, potentially leaking private user data.
In this work, we propose an algorithm which outputs an approximate PPR and has provably bounded sensitivity to input edges. In addition, we prove that our algorithm achieves similar accuracy to non-private algorithms when the input graph has large degrees. Our sensitivity-bounded PPR directly implies private algorithms for several tools of graph learning, such as, differentially private (DP) PPR ranking, DP node classification, and DP node embedding. To complement our theoretical analysis, we also empirically verify the practical performances of our algorithms.
View details

Near-Optimal Private and Scalable k-Clustering

Shyam Narayanan

NeurIPS 2022 (2022)

Preview abstract
We study the differentially private (DP) $k$-means and $k$-median clustering problems of $n$ points in $d$-dimensional Euclidean space in the massively parallel computation (MPC) model. We provide two near-optimal algorithms where the near-optimality is in three aspects: they both achieve (1). $O(1)$ parallel computation rounds, (2). near-linear in $n$ and polynomial in $k$ total computational work (i.e., near-linear running time in the sequential setting), (3). $O(1)$ relative approximation and $\text{poly}(k, d)$ additive error, where $\Omega(1)$ relative approximation is provably necessary even for any polynomial-time non-private algorithm, and $\Omega(k)$ additive error is a provable lower bound for any polynomial-time DP $k$-means/median algorithm. Our two algorithms provide a tradeoff between the relative approximation and the additive error: the first has $O(1)$ relative approximation and $\sim (k^{2.5} + k^{1.01} \sqrt{d})$ additive error, and the second one achieves $(1+\gamma)$ relative approximation to the optimal non-private algorithm for an arbitrary small constant $\gamma>0$ and with $\text{poly}(k, d)$ additive error for a larger polynomial dependence on $k$ and $d$.
To achieve our result, we develop a general framework which partitions the data and reduces the DP clustering problem for the entire dataset to the DP clustering problem for each part. To control the blow-up of the additive error introduced by each part, we develop a novel charging argument which might be of independent interest.
View details

Fair Correlation clustering

23rd International Conference on Artificial Intelligence and Statistics (AISTATS 2020) (2020) (to appear)

Preview abstract
In this paper, we study correlation clustering under fairness constraints. Fair variants of k-median and k-center clustering have been studied recently, and approximation algorithms using a notion called fairlet decomposition have been proposed. We obtain approximation algorithms for fair correlation clustering under several important types of fairness constraints.
Our results hinge on obtaining a fairlet decomposition for correlation clustering by introducing a novel combinatorial optimization problem. We define a fairlet decomposition with cost similar to the k-median cost and this allows us to obtain approximation algorithms for a wide range of fairness constraints.
We complement our theoretical results with an in-depth analysis of our algorithms on real graphs where we show that fair solutions to correlation clustering can be obtained with limited increase in cost compared to the state-of-the-art (unfair) algorithms.
View details

Sliding Window Algorithms for k-Clustering Problems

Michele Borassi

Neurips 2020 (to appear)

Preview abstract
The sliding window model of computation captures scenarios in which data is arriving continuously, but only the latest $w$ elements should be used for analysis. The goal is to design algorithms that update the solution efficiently with each arrival rather than recomputing it from scratch. In this work, we focus on $k$-clustering problems such as $k$-means and $k$-median. In this setting, we give simple and practical algorithms that come with stronger performance guarantees than previously known results. Empirically, we show that our methods store only a small fraction of the data, are orders of magnitude faster, and find solutions with cost only slightly worse than those returned by algorithms that have access to the full dataset.
View details

Fair Hierarchical Clustering

Benjamin Moseley

Marina Knittel

Yuyan Wang

Neurips 2020

Preview abstract
As machine learning has become more and more integrated into our businesses and lifestyles, researchers have begun to recognize the necessity of ensuring machine learning systems are fair. Recently, there has been an interest in defining a notion of fairness that mitigates over-representation in traditional clustering.
In this paper we extend this notion to hierarchical clustering, where the goal is to recursively partition the data to optimize a certain objective~\cite{dasgupta}. For various natural objectives, we obtain simple, efficient algorithms to find a provably good fair hierarchical clustering. Empirically, we show that our algorithms can find a fair hierarchical clustering, surprisingly, with only a negligible loss in the objective.
View details

Bisect and Conquer: Hierarchical Clustering via Max-Uncut Bisection

Vaggos Chatziafratis

AISTATS, AISTATS, AISTATS (2020), AISTATS (to appear)

Preview abstract
Hierarchical Clustering is an unsupervised data analysis method which has been widely used for decades. Despite its popularity, it had an underdeveloped analytical foundation and to address this, Dasgupta recently introduced an optimization view-point of hierarchical clustering with pair-
wise similarity information that spurred a line of work shedding light on old algorithms (e.g., Average-Linkage), but also designing new algorithms. Here, for the maximization dual of Dasgupta’s objec-
tive (introduced by Moseley-Wang), we present polynomial-time 42.46% approximation algorithms that use Max-Uncut Bisection as a subroutine. The previous best worst-case approximation factor in polynomial time was 33.6%, improving only slightly over Average-Linkage which achieves 33.3%. Finally, we complement our positive results by providing APX-hardness (even for 0-1 similarities), under the Small Set Expansion hypothesis.
View details

On-Device Algorithms for Public-Private Data with Absolute Privacy

Proceedings of The Web Conference 2019 (WWW'19) (to appear)

Preview abstract
Motivated by the increasing need to preserve privacy in digital devices, we introduce the on-device public-private model of computation.
Our motivation comes from social-network based recommender systems in which the users want to receive recommendations based on the information available on their devices, as well as the suggestions of their social contacts, without sharing such information or contacts with the central recommendation system.
Our model allows us to solve many algorithmic problems while providing absolute (deterministic) guarantees of the privacy of on-device data and the user's contacts. In fact, we ensure that the private data and private contacts are never revealed to the central system.
Our restrictive model of computation presents several interesting algorithmic challenges because any computation based on private information and contacts must be performed on local devices of limited capabilities. Despite these challenges, under realistic assumptions of inter-device communication, we show several efficient algorithms for fundamental data mining and machine learning problems, ranging from k-means clustering to heavy hitters. We complement this analysis with strong impossibility results for efficient private algorithms without allowing inter-device communication. In our experimental evaluation, we show that our private algorithms provide results almost as accurate as those of the non-private ones while speeding up the on-device computations by orders of magnitude.
View details

Scalable diversity maximization via small-size composable core-sets

31st ACM Symposium on Parallelism in Algorithms and Architectures (SPAA) (2019)

Preview abstract
Maximizing diversity is a central problem in information re-
trieval and data mining systems with prominent applications
in web search, recommender systems, news aggregators and
product search. In this paper, we study a diversity maxi-
mization problem (a.k.a. maximum dispersion problem) in
which given a set of n objects in a metric space, one wants to
find a subset of k objects with the maximum sum of pairwise
distances.
To solve this problem in a scalable distributed manner, we
apply a novel distributed framework for tackling large-scale
problems known as randomized composable core-sets: divide
the big data set into smaller parts, solve the problem for
each part, combine the solutions from each part, and solve
the problem on the union of these solutions. Our algorithms
improve significantly over the approximation guarantees of
state-of-the-art core-set-based algorithms while using min-
imum possible intermediate output size. In particular, we
present a simple distributed algorithm that achieves an al-
most optimal communication complexity, and moreover, it
asymptotically achieves approximation factor of 1/2 which
is the best possible approximation factor for the global opti-
mization problem under certain complexity theory assump-
tions.
Our algorithms are scalable and practical as shown by
our extensive empirical evaluation with large datasets and
they can be easily adapted to the major distributed comput-
ing systems like MapReduce. Furthermore, we show empir-
ically that, in real-life instances, our algorithms reach close-
to-optimal solutions with approximation factor of > 90%.
This approximation factor is far exceeding the approxima-
tion barrier for the problem and provide useful output.
View details

Better Sliding Window Algorithms to Maximize Subadditive and Diversity Objectives

Michele Borassi

Proceedings of the 38th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems (PODS 2019)

Preview abstract
The streaming computation model is a standard model for large-scale data analysis: the input arrives one element at a time, and the goal is to maintain an approximately optimal solution using only a constant, or, at worst, poly-logarithmic space.
In practice, however, recency plays a large role, and one often wishes to consider only the last w elements that have arrived, the so-called sliding window problem. A trivial approach is to simply store the last w elements in a buffer; our goal is to develop algorithms with space and update time sublinear in w. In this regime, there are two frameworks: exponential histograms and smooth histograms, which can be used to obtain sliding window algorithms for families of functions satisfying certain properties.
Unfortunately, these frameworks have limitations and cannot always be applied directly. A prominent example is the case of a submodular function with cardinality constraints. Some of these difficulties can be rectified, but often only on a case-by-case basis. Here, we describe an alternative approach to designing efficient sliding window algorithms for maximization problems. Then we instantiate this approach on a wide range of problems, yielding better algorithms for submodular function optimization, diversity optimization and general subadditive optimization. In doing so, we improve state-of-the art results obtained using problem-specific algorithms.
View details

Incentive-Aware Learning for Large Markets

Proceedings of the 2018 World Wide Web Conference on World Wide Web, WWW 2018, Lyon, France, April 23-27, 2018, pp. 1369-1378

Preview abstract
In a typical learning problem, one key step is to use training data to pick one model from a collection of models that optimizes an objective function. In many multi-agent settings, the training data is generated through the actions of the agents, and the model is
used to make a decision (e.g., how to sell an item) that affects the agents. An illustrative example of this is the problem of learning the
reserve price in an auction. In such cases, the agents have an incentive to influence the training data (e.g., by manipulating their bids in
the case of an auction) to game the system and achieve a more favorable outcome. In this paper, we study such incentive-aware learning
problem in a general setting and show that it is possible to approximately optimize the objective function under two assumptions:
(i) each individual agent is a “small” (part of the market); and (ii) there is a cost associated with manipulation. For our illustrative
application, this nicely translates to a mechanism for setting approximately optimal reserve prices in auctions where no individual
agent has significant market share. For this application, we also show that the second assumption (that manipulations are costly) is
not necessary since we can “perturb” any auction to make it costly for the agents to manipulate.
View details

Bicriteria Distributed Submodular Maximization in a Few Rounds

Proceedings of the 29th ACM Symposium on Parallelism in Algorithms and Architectures (2017)

Preview abstract
We study the problem of efficiently optimizing submodular functions under cardinality constraints in distributed setting. Recently, several distributed algorithms for this problem have been introduced which either achieve a sub-optimal solution or they run in super-constant number of rounds of computation. Unlike previous work, we aim to design distributed algorithms in multiple rounds with almost optimal approximation guarantees at the cost of outputting a larger number of elements. Toward this goal, we present a distributed algorithm that, for any ε > 0 and any constant r, outputs a set S of O(rk/ε^(1/r)) items in r rounds, and achieves a (1-ε)-approximation of the value of the optimum set with k items. This is the first distributed algorithm that achieves an approximation factor of (1-ε) running in less than log 1/ε number of rounds. We also prove a hardness result showing that the output of any 1-ε approximation distributed algorithm limited to one distributed round should have at least Ω(k/ε) items. In light of this hardness result, our distributed algorithm in one round, r = 1, is asymptotically tight in terms of the output size. We support the theoretical guarantees with an extensive empirical study of our algorithm showing that achieving almost optimum solutions is indeed possible in a few rounds for large-scale real datasets.
View details

Submodular Optimization Over Sliding Windows

Proceedings of the 26th International World Wide Web Conference, WWW (2017)

Preview abstract
Maximizing submodular functions under cardinality constraints lies at
the core of numerous data mining and machine learning applications,
including data diversification, data summarization, and coverage problems.
In this work, we study this question in the context of data
streams, where elements arrive one at a time, and we want to
design low-memory and fast update-time algorithms that maintain a good solution.
Specifically, we focus on the sliding window
model, where we are asked to maintain a solution that considers only
the last $W$ items.
In this context, we provide the first non-trivial algorithm that
maintains a provable approximation of the optimum using space sublinear in the size of the window.
In particular we give a $\nicefrac{1}{3} - \epsilon$ approximation algorithm that uses space polylogarithmic in
the spread of the values of the elements,
$\Spread$, and linear in the solution size $k$ for any constant $\epsilon > 0$. At the same
time, processing each element only requires a
polylogarithmic number of evaluations of the function itself. When a
better approximation is desired, we show a different algorithm that, at the cost of using more memory, provides a $\nicefrac{1}{2} - \epsilon$
approximation, and allows a tunable trade-off between average update time and space. This algorithm matches the best known approximation guarantees for submodular optimization in insertion-only streams, a less general formulation of the problem.
We demonstrate the efficacy of the algorithms on a number of real
world datasets, showing that their practical performance far exceeds
the theoretical bounds. The algorithms preserve high quality solutions in streams with millions of items, while storing a
negligible fraction of them.
View details

Bicriteria Distributed Submodular Maximization in a Few Rounds

29th ACM Symposium on Parallelism in Algorithms and Architectures (SPAA) (2017)

Preview abstract
We study the problem of efficiently optimizing submodular functions under
cardinality constraints in distributed setting. Recently, several
distributed algorithms for this problem have been introduced
which either achieve a sub-optimal solution or they run in super-constant
number
of rounds of computation. Unlike previous work, we aim
to design distributed algorithms in multiple rounds
with almost optimal approximation guarantees
at the cost of outputting a larger number of elements.
Toward this goal, we present a distributed algorithm that, for any \epsilon >
0
and any constant r,
outputs a set
S of O(rk/\epsilon^{1\over r}) items in r rounds, and achieves
a (1-\epsilon)-approximation of the value of the optimum set with k items.
This is the first
distributed algorithm
that achieves an approximation factor of (1-\epsilon) running in less than
$\log {1\over \epsilon}$ number of rounds.
We also prove a hardness result showing that the output of any $1-\epsilon$ approximation distributed algorithm limited to one distributed round should have at least $\Omega(k/\epsilon)$ items. In light of this hardness result, our distributed algorithm in one round, $r=1$, is asymptotically tight in terms of the output size.
We support the theoretical guarantees
with an extensive empirical study of our
algorithm showing that achieving almost optimum solutions is indeed possible in
a few rounds for large-scale real datasets.
View details

The Spread of Physical Activity Through Social Networks

Proceedings of the 26th International World Wide Web Conference 2017, WWW

Preview abstract
We study the evolution of daily physical activity of 44.5K
users on the Fitbit social network over a period of eight
months. A time-aggregated analysis shows that average alter activity, gender, and body mass index (BMI) are significantly predictive of ego activity when controlling for ego
BMI, gender, and number of friends. The direction and
effect size of the associations surfaced vary when considering chronic conditions self-reported by a large portion of the
users including diabetes, dyslipidemia, hypertension and depression. When considering the co-evolution of activity and
friendship on a month by month basis in a within-subject
analysis, we show via fixed effects modeling that the fluctuations in average alter activity significantly predict fluctuations in ego activity. Finally, we investigate the causal
factors that may drive change of physical activity over time.
We leverage a class of novel non-parametric statistical tests
to rule out homophily as the sole source of dependence in activity, even in the presence of unobserved individual traits.
View details

Ego-net Community Mining Applied to Friend Suggestion

Ismail Sebe

Ahmed Taei

Sunita Verma

Proceedings of VLDB (2016)

Preview abstract
In this paper, we present a study of the community structure
of ego-networks—the graphs representing the connections
among the neighbors of a node—for several online social
networks. Toward this goal, we design a new technique to
efficiently build and cluster all the ego-nets of a graph in
parallel (note that even just building the ego-nets efficiently
is challenging on large networks). Our experimental findings
are quite compelling: at a microscopic level it is easy to
detect high quality communities.
Leveraging on this fact we, then, develop new features
for friend suggestion based on co-occurrences of two nodes
in different ego-nets’ communities. Our new features can
be computed efficiently on very large scale graphs by just
analyzing the neighborhood of each node. Furthermore, we
prove formally on a stylized model, and by experimental
analysis that this new similarity measure outperforms the
classic local features employed for friend suggestions
View details

Reduce and aggregate: similarity ranking in multi-categorical bipartite graphs

Stefano Leonardi

WWW (2014), pp. 349-360

Preview abstract
We study the problem of computing similarity rankings in large-scale multi-categorical bipartite graphs, where the two sides of the graph represent actors and items, and the items are partitioned into an arbitrary set of categories. The problem has several real-world applications, including identifying competing advertisers and suggesting related queries in an online advertising system or finding users with similar interests and suggesting content to them. In these settings, we are interested in computing on-the-fly rankings of similar actors, given an actor and an arbitrary subset of categories of interest. Two main challenges arise: First, the bipartite graphs are huge and often lopsided (e.g. the system might receive billions of queries while presenting only millions of advertisers). Second, the sheer number of possible combinations of categories prevents the pre-computation of the results for all of them. We present a novel algorithmic framework that addresses both issues for the computation of several graph-theoretical similarity measures, including # common neighbors, and Personalized PageRank. We show how to tackle the imbalance in the graphs to speed up the computation and provide efficient real-time algorithms for computing rankings for an arbitrary subset of categories. Finally, we show experimentally the accuracy of our approach with real-world data, using both public graphs and a very large dataset from Google AdWords.
View details

Communities, Random Walks, and Social Sybil Defense.

Preview
Lorenzo Alvisi

Allen Clement

Alessandro Panconesi

Internet Mathematics (2014)

Sok: The Evolution of Sybil Defense via Social Networks

Lorenzo Alvisi

Allen Clement

Alessandro Panconesi

2013 IEEE Symposium on Security and Privacy, SP 2013

Preview abstract
Sybil attacks in which an adversary forges a potentially unbounded number of identities are a danger to distributed systems and online social networks. The goal of sybil defense is to accurately identify sybil identities. This paper surveys the evolution of sybil defense protocols that leverage the structural properties of the social graph underlying a distributed system to identify sybil identities. We make two main contributions. First, we clarify the deep connection between sybil defense and the theory of random walks. This leads us to identify a community detection algorithm that, for the first time, offers provable guarantees in the context of sybil defense. Second, we advocate a new goal for sybil defense that addresses the more limited, but practically useful, goal of securely white-listing a local region of the graph.
View details

Reconstructing Hidden Permutations Using the Average-Precision (AP) Correlation Statistic

Lorenzo De Stefani

Eli Upfal

Fabio Vandin

In Proceedings of the 30th AAAI Conference on Artificial Intelligence 2016

Spreading Rumours without the Network

Pawel Brach

Alessandro Panconesi

Piotr Sankovski

In Proceedings of the 2nd ACM Conference on Social Networks, COSN, 2014