Join our meetup, learn, connect, share, and get to know your Toronto AI community.
Browse through the latest deep learning, ai, machine learning postings from Indeed for the GTA.
Are you looking to sponsor space, be a speaker, or volunteer, feel free to give us a shout.
In Beta-VAE paper (https://openreview.net/pdf?id=Sy2fzU9gl), the authors mentioned that having Beta > 1 helps the network in learning independent latent representations. However, in VAE, the posterior distribution itself is assumed to be a Gaussian with a diagonal covariance matrix, i.e.
q(z|x) = N(U(x),Cov(x)) where Cov(x) is a diagonal matrix.
This means that we are inherently generating latents that will be independent given an input image x. So why does increase learning pressure on the KL divergence term between posterior and Gaussian prior should help any more in learning independent latents when posterior is already assumed to be independent?
submitted by /u/shamitlal
[link] [comments]