Motivating the Rules of the Game for Adversarial Example Research
Abstract
Advances in machine learning have led to broad deployment of systems with impressive
performance on important problems. Nonetheless, these systems can be induced
to make errors on data that are surprisingly similar to examples the learned system
handles correctly. The existence of these errors raises a variety of questions about
out-of-sample generalization and whether bad actors might use such examples to abuse
deployed systems. As a result of these security concerns, there has been a flurry of
recent papers proposing algorithms to defend against such malicious perturbations of
correctly handled examples. It is unclear how such misclassifications represent a different
kind of security problem than other errors, or even other attacker-produced
examples that have no specific relationship to an uncorrupted input. In this paper,
we argue that adversarial example defense papers have, to date, mostly considered
abstract, toy games that do not relate to any specific security concern. Furthermore,
defense papers have not yet precisely described all the abilities and limitations of attackers
that would be relevant in practical security. Towards this end, we establish a
taxonomy of motivations, constraints, and abilities for more plausible adversaries. Finally,
we provide a series of recommendations outlining a path forward for future work
to more clearly articulate the threat model and perform more meaningful evaluation.
performance on important problems. Nonetheless, these systems can be induced
to make errors on data that are surprisingly similar to examples the learned system
handles correctly. The existence of these errors raises a variety of questions about
out-of-sample generalization and whether bad actors might use such examples to abuse
deployed systems. As a result of these security concerns, there has been a flurry of
recent papers proposing algorithms to defend against such malicious perturbations of
correctly handled examples. It is unclear how such misclassifications represent a different
kind of security problem than other errors, or even other attacker-produced
examples that have no specific relationship to an uncorrupted input. In this paper,
we argue that adversarial example defense papers have, to date, mostly considered
abstract, toy games that do not relate to any specific security concern. Furthermore,
defense papers have not yet precisely described all the abilities and limitations of attackers
that would be relevant in practical security. Towards this end, we establish a
taxonomy of motivations, constraints, and abilities for more plausible adversaries. Finally,
we provide a series of recommendations outlining a path forward for future work
to more clearly articulate the threat model and perform more meaningful evaluation.