[R] Finding a human-like classifier
There were many attempts to explain the trade-off between accuracy and adversarial robustness. However, there was no clear understanding of the behaviors of a robust classifier which has human-like robustness.
We argue (1) why we need to consider adversarial robustness against varying magnitudes of perturbations not only focusing on a fixed perturbation threshold, (2) why we need to use different method to generate adversarially perturbed samples that can be used to train a robust classifier and measure the robustness of classifiers and (3) why we need to prioritize adversarial accuracies with different magnitudes.
We introduce Lexicographical Genuine Robustness (LGR) of classifiers that combines the above requirements. We also suggest a candidate oracle classifier called “Optimal Lexicographically Genuinely Robust Classifier (OLGRC)” that prioritizes accuracy on meaningful adversarially perturbed examples generated by smaller magnitude perturbations. The training algorithm for estimating OLGRC requires lexicographical optimization unlike existing adversarial training methods. To apply lexicographical optimization to neural network, we utilize Gradient Episodic Memory (GEM) which was originally developed for continual learning by preventing catastrophic forgetting.
TL;DR: We try to design and train a classifier whose adversarial robustness is more resemblance to robustness of human.