[R] Thoughts on Mutual Information: More Estimators and Formal Limitations
Mutual Information is an important measure of dependence between two variables and is often used in ML for various different reasons (e.g. representation learning). In the last year there were some new results and papers which made me curious about the topic and I wanted to share some of my thoughts.
In the first post I introduce a couple of new estimators based on bounds on the log marginal likelihood. Some of them are very appealing as they give us 1) both lower and upper bounds, 2) a way to make bounds tighter by putting more computation into it.
In the second post I move to the analysis of (some of) these estimators, as well as address the Formal Limitations paper – a paper that, loosely speaking, forbids good only-samples-based (blackbox) bounds on the Mutual Information. In particular, I show how this issue manifests itself in several widely used blackbox bounds, and then contrast it with bounds that use some distribution knowledge.
Discussion might not be quite introductory, so I recommend checking out a recent ICML paper On Variational Bounds of Mutual Information.