[D] Why do disentangling methods not result in independent dimensions of the learned representation?
By disentangling methods I mean methods under the VAE framework such as factor-VAE and beta-TCVAE which explicitly regularize the total correlation of the aggregate posterior q(z) approx 1/N sum_n q(z | x_n).
Locatello et. al. in their large-scale study of disentanglement methods (1) show empirical evidence to demonstrate that the dimensions of the mean representation of q(z|x) (usually used for representation) are correlated, but it seems that the dimensions of the mean representation by definition are independent if we use a factorial distribution to represent the posterior such as a diagonal-covariance Gaussian. Also, averaging this representation over the data distribution should also be factorial if we assume that the aggregate posterior q(z) is factorial (proof in 2), so I think the claim in 1 is wrong.