[D] Efficient GPU implementation of Empirical Fisher information matrix?
I have seen many implementations. It seems to be a limitation of autograd itself that we can compute the gradient of loglikelihood only one sample at a time.
The batch version has been used but in a WRONG way.
I have seen computing the gradient of a batch of a loglikelihood (essentially a mean of gradients), it doesn’t seem to be truthful to the real Empirical Fisher calculation at all (only a kind of approximation).
Is there a correct GPU efficient impementation of Empricial Fisher out there?