[D] NLL loss implemented in Bengio’s VRNN paper for Gaussian mixture model seems strange to me…
I’m currently trying to make an MLP which outputs statistics for Gaussian Mixture Model (GMM), i.e., means, covariance matrixes, and coefficients for each mixture components. This approach is already have been explored by others, such as Variational RNN paper (2016, https://arxiv.org/pdf/1506.02216.pdf ). Although I’m not interested in posterior distribution and not going to implement my NN as Variational form, I tried to dig into the source code since the form of NLL loss function should be the same. Fortunately, the authors provided their GitHub link ( https://github.com/jych/nips2015_vrnn ) and I could find a form of loss function they employed. Following code is the cost function for bivariate GMM model.(https://github.com/jych/cle/blob/master/cle/cost/__init__.py ) https://i.redd.it/7k435xush0w21.png I’ve found that the highlighted part seems… not right. Here is the log-likelihood function for Gaussian Mixture Model. https://i.redd.it/pfmf560ei0w21.png This function is rather intractable, since the log contains a weighted sum. As far I know, we need to relax the problem and classic EM iteration is necessary to optimize this log-likelihood function. (Actually, there are many papers solely focused on this problem; such as neural EM. (https://papers.nips.cc/paper/7246-neural-expectation-maximization.pdf )) In EM algorithm, after you calculate gamma (posterior of component categorical distribution) in ‘E’ step, you maximize the log-likelihood (‘M’ step’) by maximizing following formula. https://i.redd.it/k9w6ast9j0w21.png But it seems like VRNN paper’s implementation of NLL loss just simply ignored the gamma function (or E-step), and assume that red-circled part is sufficient for loss. (Ignore c_b part, which is a binary entropy term for non-GMM posterior.) Furthermore, the whole calculation of VRNN’s code doesn’t involve the variable ‘coeff’ (π) except the highlighted part. It means that NLL loss just contains the log value of the coefficient, which makes all coefficient equal to 1/N in order to minimize negative log value. Is this kind of formulation is appropriate for Gaussian mixture model? (Like, is it some sort of linear approximation of original form?) Otherwise, is there a simple way to calculate (or approximate) NLL loss for GMM? Please let me know If I got anything wrong. I’m relatively new to this field and need constructive discussion for this topic. Thanks! submitted by /u/nokpil |