[D] Tuning of generated synthetic data for instance segmentation
I’m currently training a Mask-RCNN model through synthetic images “generated” through a procedure similar to the “Cut, Paste and Learn” paper. In a nutshell, this paper just randomly pastes crops of objects over backgrounds, with pretty standard augmentation for the crops themselves and Gaussian / Poisson blending for pasting. The resulting images contain all the objects with perfect masks and bounding box labels, over some arbitrary backgrounds.
However, the generated training data still looks fairly different from real images. I do, however, have a large dataset of unlabeled real images with the real objects in them. Would anyone be aware of a method for tuning a generated image to look more similar to the images in the real dataset? I would want to preserve spatial information so as to not invalidate generated labels, but also add noise / shadows / pixel artifacts in a meaningful way that resembles those found in my real dataset.
My first thought was to look for papers using something like auto-encoders, but I was flooded with papers about VAEs and end-to-end generation. Is anyone aware of research for this specific problem?