Starting with the simplest adversarial attack, the Fast Gradient Sign Method (FGSM). It is fairly computationally inexpensive attack that worked against most of the sample images. FGSM, however, has difficulties against some of the more difficult images with higher initial confidences. Also, the manipulations to the image are most of the time noticeable. This attack may have its place in quickly generating large quantities of adversarial examples.

One extension of FGSM, the Basic Iterative Method (BIM), is more computationally expensive. However, it is also much more effective in both attacking the network and hiding the perturbations from humans. In our experiments BIM succeeded almost every of the time with he most confident adversaries.

Adversaries which are truly imperceptible can be generated by DeepFool, a different type of algorithm.

Further Reading and Other Work

The attacks we discussed in this project are optimization based. Xiao et al. use a feed-forward net to generate adversaries and a discriminator network to ensure the image remains realistic to humans. Their AdvGAN attacks are targeted and don’t need a gradient to work with once the generator network is trained.

Earlier we presented some work which explains why adversarial examples exist. Geirhos et al. provide an interesting generalization of this. Both biological and artificial learning systems try to achieve a goal with the “least effort”. This can cause them to use solutions which are not aligned with the human expectation. This is called shortcut learning. In this context adversarial attacks in the image domain are considered helpful in testing how well the model’s solution is aligned with the human expectation.