Blog

TL;DR: Representation layers of standard networks are really useful for, e.g. transfer learning, but are extremely brittle and known to sort of “break down” when it comes to manipulating them or visualizing them in natural ways. We propose robust optimization (adversarial training) as a way to enforce priors on models’ learned features. The resulting models (just lp-robust classifiers) are amenable to all sorts of “natural” manipulation that follow exactly from our idealization of representations as high-level features, but are impossible with standard networks. This suggests that robustness might be more broadly useful than just protection against adversarial examples.

submitted by /u/andrew_ilyas
[link] [comments]