Blog

TL;DR: We show that adversarial examples aren’t really weird aberrations or random artifacts, and are instead actually meaningful but imperceptible features of the data distribution (i.e. they are helpful for generalization). We prove this through a series of experiments that shows that (a) you can learn just based on these imperceptible features embedded into a completely mislabeled training set and generalize to the true test set (b) you can remove these imperceptible features and generalize *robustly* to the true test set (with standard training).

We would love to answer any questions/comments!

submitted by /u/andrew_ilyas
[link] [comments]