[Discussion] Methods to use alternative form of reconstruction objective for VAE than pixelwise error
I am currently working on a project which involves improving the reconstruction capability of the VAE perceptually. Since the basic VAE objective uses the pixelwise error for the reconstruction part, the generated images have a peculiar blurry characteristic which makes them perceptually unreal. I did keyword searches on Scholar and ResearchGate, but was not able to find works that replace this pixelwise metric with something more appropriate for images.
The closest I got was with the paper titled “Autoencoding beyond pixels using a learned similarity metric” https://arxiv.org/pdf/1512.09300.pdf. This is a great piece of work and I find the idea of combining the GAN discriminator with the VAE superb.
In my search, I also found the flow based papers such as GLOW and RealNVP. But these use the reversible operations because of which, the posterior probability can be easily calculated since it is a deterministic function of the prior probability. I am actually looking for the variational inference generative models which simply use a different form of reconstruction objective for better perceptual results.
I kindly request all the fellow redditors to please provide me with works that you are aware of. It would be a great help. Thanking you.