[R] Adversarial explanations for understanding image classification decisions and improved neural network robustness
Open Access pre-print: https://arxiv.org/abs/1906.02896
Open Access PDF (low-resolution images, due to size restriction): https://arxiv.org/pdf/1906.02896.pdf
Peer-reviewed publication (with full-resolution images; also see bottom of this Reddit post): https://www.nature.com/articles/s42256-019-0104-6
Comparing explanatory power between Grad-CAM [Selvaraju et al. 2017] and Adversarial Explanations (AEs) when applied to a robust NN trained on CIFAR-10. The top four rows, subfigure a, demonstrate comparisons on different inputs. For each row, the columns show: the original “Input” image, labeled with the most confidently-predicted class, the correct class, and the NN’s confidence in each; two Grad-CAM explanations, one for each predicted class shown by the input; two AEs, divided into the adversarial noise used to produce the AE, and the AE itself. Below those rows, subfigures b through i are annotated versions of the AEs for subfigure a, indicating regions which contributed to or detracted from each predicted class. See the main text for full commentary.
Author’s note: The freely-available pre-print on ArXiv contains all content available in the Nature version, just in a slightly different ordering (IEEE vs Nature style). The resolution of the ArXiv images is a bit lower, as the full document from pdflatex is ~97 MB due to included images… A Ghostscript-optimized version, with full-resolution images, weighs in at 25MB and may be found here: https://drive.google.com/open?id=1xGCja0BUQ2VR9nlKre6QzJ2Q-qpp8ub8