Adversarial Reprogramming of Neural Networks
Abstract
Deep neural networks are susceptible to adversarial attacks. In computer vision,
well-crafted perturbations to images can cause neural networks to make mistakes
such as confusing a cat with a computer. Previous adversarial attacks have been
designed to degrade performance of models or cause machine learning models
to produce specific outputs chosen ahead of time by the attacker. We introduce
attacks that instead reprogram the target model to perform a task chosen by the
attacker—without the attacker needing to specify or compute the desired output
for each test-time input. This attack finds a single adversarial perturbation, that
can be added to all test-time inputs to a machine learning model in order to cause
the model to perform a task chosen by the adversary—even if the model was not
trained to do this task. These perturbations can thus be considered a program
for the new task. We demonstrate adversarial reprogramming on six ImageNet
classification models, repurposing these models to perform a counting task, as well
as classification tasks: classification of MNIST and CIFAR-10 examples presented
as inputs to the ImageNet model.