[R] Mask Based Unsupervised Content Transfer
Hi All, Author here –
Given two domains where one contains some additional information compared to the other, our method disentangles the common and the seperate parts and transfers the seperate information from one image to another using a mask, while not using any supervision at train time. For example, we can transfer the specific facial hair from an image of a men with a mustache to an image of a shaved person. Using a mask enables state-of-the-art quality (see example here), but also, the generated mask can be used as a semantic segmentation of the seperate part. Thus our method perform weakly-supervised semantic segmentation, using only class lables as supervision, achieving state-of-the-art performance, see example here.png).
In short, our architecture consist of two encoders, two decoders and discriminator. One encoder for encoding the common part and one to encode the separate part. The discriminator used to disentangle the encoding to the separate and common parts correctly. In training, One decoder used to decode only the common part, and the second decoder decodes only the separate part using a mask. In inference, we use only the second decoder which given the relevant encoding, adds the specific content to a new image. We also use novel regularization scheme to encourage to mask to be minimal.
Refer to the full paper for more details.
PyTorch implementation is on GitHub.
Feel free to ask questions.