A Characterization of Semi-Supervised Adversarially-Robust PAC Learnability

Idan Attias
Steve Hanneke
Yishay Mansour
NeurIPS 2022
Google Scholar


We study the problem of \textit{semi-supervised} learning of an adversarially-robust predictor in the $\pac$ model, where the learner has access to both \textit{labeled} and \textit{unlabeled} examples. The sample complexity in semi-supervised learning has two parameters, the number of labeled examples and the number of unlabeled examples. We consider the complexity measures, $\VCU\leq \dim_\U\leq \VC$ and $\VC^*$, where $\VC$ is the standard $\VC$-dimension, $\VC^*$ is its dual, and the other two measures appeared in \cite{montasser2019vc}. The best sample bound for robust supervised PAC learning is $\Lambda=O(\VC\cdot \VC^*)$, and we will compare our sample bounds to $\Lambda$. Our main results are the following: (1) in the realizable setting it is sufficient to have $\O(\VCU)$ labeled examples and $\O(\Lambda)$ unlabeled examples. (2) In the agnostic setting, let $\eta$ be the minimal error. The sample complexity depends, if we allow an error of $2\eta+\epsilon$, in which case it is still sufficient to have $\O(\VCU)$ labeled examples and $\O(\Lambda)$ unlabeled examples. If we insist on have error $\eta+\epsilon$ then $\Omega(\dim_\U)$ labeled examples are necessary. The above results show that there is a significant benefit in semi-supervised robust learning, as there are hypothesis classes with $\VCU=0$ and $\dim_\U$ arbitrary large. Having access only to labeled examples requires at least $\dim_{\U}$ labeled examples, while we require only $\O(1)$ labeled examples and the \textit{unlabeled} sample size is at the same order of the \textit{labeled} sample size required for supervised robust learning. Any improvement in the supervised robust sample complexity $\Lambda$ will immediate improve our bounds. A byproduct of our result is that if we assume that the distribution is robustly realizable by a hypothesis class (i.e., there exist a hypothesis with zero robust error) then, with respect to the non-robust loss (i.e., the standard $0$-$1$ loss) we can learn with only $O(\VCU)$ labeled examples, even if the $\VC$ is infinite.

Research Areas