# Silvio Lattanzi

Silvio received his bachelor (2005), master (2007) and PhD(2011) degree from the Computer Science department of Sapienza University of Rome, under the supervision of Alessandro Panconesi. Silvio joined Google Research in the New York office in January 2011. Since April 2017 Silvio moved to Google Research Zurich.

Authored Publications

Sort By

Active Learning of Classifiers with Label and Seed Queries

Andrea Paudice

Marco Bressan

Maximilian Thiessen

Nicolo Cesa-Bianchi

NeurIPS 2022 (to appear)

Preview abstract
We study exact active learning of binary and multiclass classifiers with margin. Given an $n$-point set $X \subset \R^m$, we want to learn any unknown classifier on $X$ whose classes have finite \emph{strong convex hull margin}, a new notion extending the SVM margin.
On the other hand, using the more powerful \emph{seed} queries (a variant of equivalence queries), the target classifier could be learned in $\scO(m \log n)$ queries via Littlestone's Halving algorithm; however, Halving is computationally inefficient.
In this work we show that, by carefully combining the two types of queries, a binary classifier can be learned in time $\poly(n+m)$ using only $\scO(m^2 \log n)$ label queries and $\scO\big(m \log \frac{m}{\gamma}\big)$ seed queries; the result extends to $k$-class classifiers at the price of a $k!k^2$ multiplicative overhead. Similar results hold when the input points have bounded bit complexity, or when only one class has strong convex hull margin against the rest. We complement these upper bounds by showing that in the worst case any algorithm needs $\Omega\big(\frac{k m \log \nicefrac{1}{\gamma}}{\log m}\big)$ seed and label queries to learn a $k$-class classifier with strong convex hull margin $\gamma$.
View details

Scalable Differentially Private Clustering via Hierarchically Separated Trees

Chris Schwiegelshohn

David Saulpic

2022 ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2022) (to appear)

Preview abstract
We study the private $k$-median and $k$-means clustering problem in $d$ dimensional Euclidean space.
By leveraging tree embeddings, we give an efficient and easy to implement algorithm, that is empirically competitive with state of the art non private methods.
We prove that our method computes a solution with cost at most $O(d^{3/2}\log n)\cdot OPT + O(k d^2 \log^2 n / \epsilon^2)$, where $\epsilon$ is the privacy guarantee. (The dimension term, $d$, can be replaced with $O(\log k)$ using standard dimension reduction techniques.) Although the worst-case guarantee is worse than that of state of the art private clustering methods, the algorithm we propose is practical, runs in near-linear, $\tilde{O}(nkd)$, time and scales to tens of millions of points. We also show that our method is amenable to parallelization in large-scale distributed computing environments. In particular we show that our private algorithms can be implemented in logarithmic number of MPC rounds in the sublinear memory regime.
Finally, we complement our theoretical analysis with an empirical evaluation demonstrating the algorithm's efficiency and accuracy in comparison to other privacy clustering baselines.
View details

Near-Optimal Correlation Clustering with Privacy

Ashkan Norouzi Fard

Chenglin Fan

Jakub Tarnawski

Slobodan Mitrović

NeurIPS 2022 (2022) (to appear)

Preview abstract
Correlation clustering is a central problem in unsupervised learning, with applications spanning community detection, duplicate detection, automated labeling and many more. In the correlation clustering problem one receives as input a set of nodes and for each node a list of co-clustering preferences, and the goal is to output a clustering that minimizes the disagreement with the specified nodes' preferences. In this paper, we introduce a simple and computationally efficient algorithm for the correlation clustering problem with provable privacy guarantees. Our additive error is stronger than the one shown in prior work and is optimal up to polylogarithmic factors for fixed privacy parameters.
View details

Correlation Clustering in Constant Many Parallel Rounds

Ashkan Norouzi Fard

Jakub Tarnawski

Slobodan Mitrović

ICML (2022) (to appear)

Preview abstract
Correlation clustering is a central topic in unsupervised learning, with many applications in ML and data mining. In correlation clustering, one receives as input a signed graph and the goal is to partition it to minimize the number of disagreements. In this work we propose a massively parallel computation (MPC) algorithm for this problem that is considerably faster than prior work. In particular, our algorithm uses machines with memory sublinear in the number of nodes in the graph and returns a constant approximation while running only for a constant number of rounds. To the best of our knowledge, our algorithm is the first that can provably approximate a clustering problem using only a constant number of MPC rounds in the sublinear memory regime. We complement our analysis with an experimental scalability\nnote{I would remove "scalability": it is not clear that this will be demonstrated with mid-sized graphs} evaluation of our techniques.
View details

Deletion Robust Submodular Maximization over Matroids

Ashkan Norouzi Fard

Federico Fusco

Paul Duetting

ICML'22 (2022)

Preview abstract
Maximizing a monotone submodular function is a fundamental task in machine learning. In this paper we study the deletion robust version of the problem under the classic matroids constraint. Here the goal is to extract a small size summary of the dataset that contains a high value independent set even after an adversary deleted some elements. We present constant-factor approximation algorithms, whose space complexity depends on the rank $k$ of the matroid and the number $d$ of deleted elements. In the centralized setting we present a $(3.582+O(\eps))$-approximation algorithm with summary size $O(k + \frac{d \log k}{\eps^2})$. In the streaming setting we provide a $(5.582+O(\eps))$-approximation algorithm with summary size and memory $O(k + \frac{d \log k}{\eps^2})$. We complement our theoretical results with an in-depth experimental analysis showing the effectiveness of our algorithms on real-world datasets.
View details