Theoretically Grounded Loss Functions and Algorithms for Adversarial Robustness

Pranjal Awasthi
Anqi Mao
The Thirty-Sixth International Conference on Artificial Intelligence and Statistics (AISTATS 2023)

Abstract

Adversarial robustness is a critical property of classifiers in applications as they are increasingly deployed in complex real-world systems. Yet, achieving accurate adversarial robustness in machine learning remains a persistent challenge and the choice of the surrogate loss function used for training a key factor. We present a family of new loss functions for adversarial robustness, *smooth adversarial losses*, which we show can be derived in a general way from broad families of loss functions used in multi-class classification. We prove strong $H$-consistency theoretical guarantees for these loss functions, including multi-class $H$-consistency bounds for sum losses in the adversarial setting. We design new regularized algorithms based on the minimization of these principled smooth adversarial losses (PSAL). We further show through a series of extensive experiments with the CIFAR-10, CIFAR-100 and SVHN datasets that our PSAL algorithm consistently outperforms the current state-of-the-art technique, TRADES, for both robust accuracy against $\ell_{\infty}$-norm bounded perturbations and, even more significantly, for clean accuracy. Finally, we prove that, unlike PSAL, the TRADES loss in general does not admit an $H$-consistency property.
×