Abstract
It is well known that for neural networks, it is possible to construct
inputs which are misclassified by the network yet indistinguishable from
true data points, known as ``adversarial examples''. We propose a simple
modification to standard neural network architectures, \emph{thermometer
encoding}, which significantly increases the robustness of the network to
adversarial examples. We demonstrate this robustness with experiments
on the MNIST, CIFAR-10, CIFAR-100, and SVHN datasets, and show that
models with thermometer-encoded inputs consistently have higher accuracy
on adversarial examples, while also maintaining the same accuracy on
non-adversarial examples and training more quickly.