[D] Activation masking (pruning), then how to calculate pruned weights (zeroed weights due to zero activations)
Calculating pruned weights after pruning them is easy, just nonzero_params/all_params * 100%
However I can’t find papers/method to calculate pruned weights via activation-mask based pruning,
Anyone here know how to calculate pruned weights due to pruned activations (masked activations)? This is the case where no weights are pruned directly, but instead weights are pruned by zeroing them by putting mask on activations before them ( because zero * zero = zero)
Unlike ordinary weight pruning, it involves going through zeros through matmul, conv, with strides, padding, and kernel size, etc.