[D] Learning a prior on the latent variables for generating samples from a VAE
I’m trying to find literature on generating samples from a VAE where a prior is learned (as an additional later step after training the VAE) on the latent variables. Then the prior is sampled to produce the latent variables, z, to feed to the decoder, rather than getting z ~ N(0, 1) (if q(z|x) is a unit Gaussian).
Empirically I’ve noticed that for very complex and diverse images, the reconstructions from a traditional VAE look good enough (despite being blurry but that’s a different issue), but when you try to generate entirely new samples by feeding the decoder a random z ~ N(0, 1), they look awful.
What if we don’t just randomly sample z ~ N(0, 1), but produce robust new z’s in some other way, such as through a function approximator. For example, after training the VAE, go through your entire dataset, compute mu and sigma, and train an auto regressive feedforward model to produce a plausible z. Or use the same data to train a model that takes uniform random noise as input and produces z’s as output. This thought is inspired by the VQ-VAE approach (https://arxiv.org/abs/1711.00937, and https://arxiv.org/abs/1906.00446), where they train a PixelCNN on all the quantized latent vectors produced by the VAE from their entire dataset, to sample new latents and produce new images.
Would love to hear your thoughts, or get links to papers on this. Thanks!