[D] Memory Aware Synapses (MAS): how to compute additional loss?
I am currently reading the paper “Memory Aware Synapses: Learning what (not) to forget” (https://arxiv.org/abs/1711.09601) and am trying to figure out how to compute the additional loss term. The weight importance matrix Ω of the current parameters θ is (as I understand) just the gradients of all individual weights. But how is Ω(θ – θ)) computed, specifically Ω(θ)). I tried looking through the official git repository, but was somehow not able to find the answer. Does anyone have experience with this approach?