[D] StyleGAN 2, generative models, and the nigh impossibility of detecting state-of-the-art under realistic conditions
I just finished reading “Analyzing and Improving the Image Quality of StyleGAN” and I’ve very impressed by the improvements to an already very impressive model. NVlabs really knocked it out of the park.
The quality of the images was already quite high, but now are much less prone to distinctive artifacts as before. Additionally, it is now much easier to project a real image into the StyleGAN space for manipulation. It isn’t hard to imagine a system whereby social media users have their photos automatically embedded into a FFHQ StyleGAN feature space and moved along a learned “attractiveness” vector before publishing. Or similarly, the creation of tools for creating fraudulent imagery of others in a way that is even more widespread and hard-to-detect than they already are.
While the paper highlights the advantages of detecting GAN images using projection, this approach only works in a white-box setting, and seems to only have been tested against unobfuscated images. I feel that in some ways, this result provides a false sense of security. While it may be possible to more easily find unaltered images from officially released StyleGAN 2 models, the overall impact of higher-quality generative models will likely be an increase in detection difficulty under practical conditions.
All in all, I’m heavily reminded of past work on adversarial attacks done by Nicholas Carlini, where he repeatedly demonstrated that many published “defenses to adversarial examples” only addressed FGSM or could be easily counteracted by learning a simple approximation of most specially-designed models.
New defense mechanisms are often evaluated under unrealistic laboratory settings against weak attacks. Not enough time has passed for projection-based detection to be evaluated substantively, but I’ll be surprised if projection is a solution that ends up finding much success at scale (particularly given that by the time such a system has been adequately tuned for a particular use-case, the state-of-the-art models will likely have moved past it).
None of this should be interpreted as a criticism of the paper — I don’t think the authors can be expected to do an exhaustive evaluation of abuse countermeasures in addition to making major innovations to the state-of-the-art in image generation.
This is a problem common in the generative model space. There are parallels to the OpenAI decision to hold back their 1.5B parameter GPT-2 model on the initial release of their paper. Holding back a model seems antithetical to machine learning research (and arguably only reduces the number of people who can research countermeasures), but without a lengthy head-start, it seems unrealistic to expect to ever be able to detect the current state-of-the-art generative models in the wild.
So what’s the solution? Should researchers working on generative models be developing a closer working relationship with those trying to detect the outputs of those models? Should models always be released to the entire research community at once, or should there be a staggered release cycle?
Or alternatively, should we give up entirely on detecting generative models via their output, and instead focus on simpler systems that make abusing them at scale more difficult (better online verification practices, curated fact-checking resources, etc.)?
What are your thoughts on the potential for abuse of generative models, and what [should we]/[can we] do about it as researchers/practitioners/humans?