[P] Help implementing “Deep Unsupervised Clustering with Gaussian Mixture Variational Autoencoders” paper?

Written by torontoai on July 18, 2019. Posted in Reddit MachineLearning.

Here is the link the paper I am referring to: https://arxiv.org/pdf/1611.02648.pdf

I am having trouble understanding how to implement this paper correctly. So I understand that instead of using an isotropic Gaussian as the prior for the latent space, they are using a mixture of Gaussians. And then I am really having trouble understanding how they are calculating their lower bound, or specifically the terms in it, which are reconstruction term, conditional prior term, w-prior term and z-prior term. The z-prior term is a direct probability of the class a data point would belong to, and I am not sure where they are getting this from. So if anybody could offer any help or point me to somewhere I could find some help, it would be greatly appreciated!

And in summary, here are my questions:

How are they generating a mixture of Gaussians for the latent space? Does this mean creating n distributions for the latent space (where n is the number of clusters), so basically having n sets of mean and variance layers (each with the size of the latent space), rather than just one set of these layers like in a normal variational autoencoder? Or is the latent space still representing all the data, and then Gaussian distributions are sampled from the latent space?
How are they reparameterizing the distributions of multiple distributions (assuming my understanding of how they are doing the multiple Gaussians correct, which it’s very likely not)?
How are they directly outputting the probability that a sample belongs to a certain distribution, which represents a cluster?

Any help is much appreciated, thank you!

Update: While thinking about it, I got this idea – they are generating n (number of clusters) mean and variance layers, and then averaging them out in order to do the reparametrization for the latent space? Or maybe averaging the reparametrization terms? I don’t know if that’s right, and it seems crazy expensive computationally if the value of n is large? And wouldn’t the mean and variance layers all be the same if done this way? I don’t know, I’m just confused, and it is very late.

submitted by /u/that_one_ai_nerd
[link] [comments]

Blog

Learn About Our Meetup

5000+ Members

MEETUPS

JOB POSTINGS

CONTACT

[P] Help implementing “Deep Unsupervised Clustering with Gaussian Mixture Variational Autoencoders” paper?