[Discussion] Is MINE(Mutual Information Neural Estimation) also helpful for reducing Mutual Information problem?
Hello, i got a old-fashioned but confused question about Mutual Information Neural Estimation(MINE), 2018 ICML.
In the paper, the lower bound of mutual information is estimated with neural-net-parameterized function (what is called as statistics network), and various experiments were held including information bottleneck, which reduces I(X; Z).
It’s very well-written with theoretical background, but i’m stucked with reimplement the IB results; Unfortunately the paper doesn’t provides full details about IB section; So if you have any kind of experience with employing MINE to reducing mutual information, it’d be a big pleasure if you share the experience. I made a statistics network following the paper, and optimize the statistics network while employ its estimated MI lower bound to the I(X; Z) regularizer. But it seems very volatile to initial value of exponential_moving_average(exp(t)). My error rate is hang around 1.5% which is even worse than vanila FCN.
Also, i’m not fully convinced how such MI lower-bound estimating models are greatful to reducing MI problems; Is reducing the ‘approximated’ lower bound of MI guarantee the practical reduction of MI? I think optimizing the MI estimator while also reducing such estimated MI lower bound might be not stable; as GAN, it may be kind of minmax training. On the otherhand, if we are consistent with both statistics network(increase lowerbound) and our designed loss(also increasing lowerbound), i think there is no problem. How do you think about it?