- Aryeh Kontorovich
- Idan Attias
- Yishay Mansour
We consider a model of robust learning in an adversarial environment. The learner gets uncorrupted training data with access to possible corruptions that may be used by the adversary during testing. Their aim is to build a robust classifier that would be tested on future adversarially corrupted data. We use a zero-sum game between the learner and the adversary as our game theoretic framework. The adversary is limited to k possible corruptions for each input. Our model is closely related to the adversarial examples model of Schmidt et al. (2018); Madry et al. (2017)
We refer to binary and multi-class classification settings, and regression setting. Our main results are generalization bounds for all settings. For the binary classification setting, we both improve a generalization bound, previously found in Feige, Mansour, and Schapire (2015), which handles a weighted average of hypotheses from H, and also are able to handle an infinite hypothesis class H. The sample complexity is improved from O( 1 log( |H| )) to O( 1 (k log(k) VC(H) + log 1 )). ε4δε2 δ The core of all our proofs is based on bounds of the empirical Rademacher complexity.
For the binary classification, we use a known regret minimization algorithm of Feige, Mansour, and Schapire (2015) that uses an ERM oracle as a blackbox and we expand on the multi-class and regression settings. The algorithm provides us near optimal policies for the players on a given training sample. The learner starts with a fixed hypothesis class H and chooses a convex combination of hypotheses from H. The learner’s loss is measured on adversarial corrupted inputs.
Along the way, we obtain results on fat-shattering dimension and Rademacher complexity of k-fold maxima over function classes; these may be of independent interest.