Skip to main content


Learn About Our Meetup

5000+ Members



Join our meetup, learn, connect, share, and get to know your Toronto AI community. 



Browse through the latest deep learning, ai, machine learning postings from Indeed for the GTA.



Are you looking to sponsor space, be a speaker, or volunteer, feel free to give us a shout.

[P] Help implementing “Deep Unsupervised Clustering with Gaussian Mixture Variational Autoencoders” paper?

Here is the link the paper I am referring to:

I am having trouble understanding how to implement this paper correctly. So I understand that instead of using an isotropic Gaussian as the prior for the latent space, they are using a mixture of Gaussians. And then I am really having trouble understanding how they are calculating their lower bound, or specifically the terms in it, which are reconstruction term, conditional prior term, w-prior term and z-prior term. The z-prior term is a direct probability of the class a data point would belong to, and I am not sure where they are getting this from. So if anybody could offer any help or point me to somewhere I could find some help, it would be greatly appreciated!

And in summary, here are my questions:

  1. How are they generating a mixture of Gaussians for the latent space? Does this mean creating n distributions for the latent space (where n is the number of clusters), so basically having n sets of mean and variance layers (each with the size of the latent space), rather than just one set of these layers like in a normal variational autoencoder? Or is the latent space still representing all the data, and then Gaussian distributions are sampled from the latent space?
  2. How are they reparameterizing the distributions of multiple distributions (assuming my understanding of how they are doing the multiple Gaussians correct, which it’s very likely not)?
  3. How are they directly outputting the probability that a sample belongs to a certain distribution, which represents a cluster?

Any help is much appreciated, thank you!

Update: While thinking about it, I got this idea – they are generating n (number of clusters) mean and variance layers, and then averaging them out in order to do the reparametrization for the latent space? Or maybe averaging the reparametrization terms? I don’t know if that’s right, and it seems crazy expensive computationally if the value of n is large? And wouldn’t the mean and variance layers all be the same if done this way? I don’t know, I’m just confused, and it is very late.

submitted by /u/that_one_ai_nerd
[link] [comments]