Adversarial Robustness of Streaming Algorithms through Importance Sampling
Abstract
Robustness against adversarial attacks have recently been at the forefront of algorithmic design for machine learning tasks. In the adversarial streaming model, an adversary gives an algorithm a sequence of adaptively chosen updates $u_1,\ldots,u_n$ as a data stream.
The goal of the algorithm is to compute or approximate some predetermined function for every prefix of the adversarial stream, but the adversary may generate future updates based on previous outputs of the algorithm.
In particular, the adversary may gradually learn the random bits internally used by an algorithm to manipulate dependencies in the input.
This is especially problematic as many important problems in the streaming model require randomized algorithms, as they are known to not admit any deterministic algorithms that use sublinear space.
In this paper, we introduce adversarially robust streaming algorithms for central machine learning and algorithmic tasks, such regression and clustering, as well as their more general counterparts, subspace embedding, low-rank approximation, and coreset construction.
For regression and other numerical linear algebra related tasks, we consider the row arrival streaming model.
Our results are based on a simple, but powerful, observation that sampling based algorithms give rise to adversarial robustness which is in contrast to sketching based algorithms, which are very prevalent in the streaming literature but suffer from adversarial attacks.
In addition, we show that the well-known merge and reduce paradigm in streaming is adversarially robust.
Since the merge and reduce paradigm defines coreset constructions, we thus obtain robust algorithms for $k$-means, $k$-median, $k$-center, Bregman clustering, projective clustering, principal component analysis (PCA) and non-negative matrix factorization.
To the best of our knowledge, these are the first adversarially robust methods for these problems.
The goal of the algorithm is to compute or approximate some predetermined function for every prefix of the adversarial stream, but the adversary may generate future updates based on previous outputs of the algorithm.
In particular, the adversary may gradually learn the random bits internally used by an algorithm to manipulate dependencies in the input.
This is especially problematic as many important problems in the streaming model require randomized algorithms, as they are known to not admit any deterministic algorithms that use sublinear space.
In this paper, we introduce adversarially robust streaming algorithms for central machine learning and algorithmic tasks, such regression and clustering, as well as their more general counterparts, subspace embedding, low-rank approximation, and coreset construction.
For regression and other numerical linear algebra related tasks, we consider the row arrival streaming model.
Our results are based on a simple, but powerful, observation that sampling based algorithms give rise to adversarial robustness which is in contrast to sketching based algorithms, which are very prevalent in the streaming literature but suffer from adversarial attacks.
In addition, we show that the well-known merge and reduce paradigm in streaming is adversarially robust.
Since the merge and reduce paradigm defines coreset constructions, we thus obtain robust algorithms for $k$-means, $k$-median, $k$-center, Bregman clustering, projective clustering, principal component analysis (PCA) and non-negative matrix factorization.
To the best of our knowledge, these are the first adversarially robust methods for these problems.