[D] Does anybody know of any conditional GAN work which encodes both understanding of object and background?
These days, the high fidelity GAN papers are often focused on reproducing class-conditional datasets such as ImageNet. But for use cases where it is important to not only generate “dog” but generate something like “dog on grass” or “dog on concrete”, has any work been done to independently encode notions of object and background with a generator?
A naive approach could be to explode the amount of potential classes by separating the data into a lot more classes. But that’s not very practical or interesting 😛
One thing I was thinking was perhaps to push this to the image translation domain (e.g. CycleGAN) – instead of doing something like trying to convert an object to another object (e.g. horse to zebra), the goal would be to convert a background to different sort of background. Any thoughts on this approach?
Another more experimental/interesting thing I was thinking was perhaps to use a multi-generator approach, one of which in theory could generate “object”, another which could do “background/context” – and you’d add the tensors together and backpropagate the loss through both generators. But as far as I can tell, nobody’s done anything like this, and I’ve no idea how to encode a prior on a specific type of generation so I’m just spitballing here. (miniature hypothesis that I have – you could use different architectures which have priors on creating different types of things, e.g. convGANs tend to do well with texture, while self attention GAN does better with structures/objects. there are certainly better ways to get networks to do what you want via different loss functions or something, but not sure how that’d all work together…).