[D] Self normalizing weight and activations
I need to train a classifier and use the last linear layer as embedding for other stuff. If possible I want the weights always constrained to -1 and 1, N(0,1), during training.
Is there paper that shows method to update weights so that all weights have 0 mean and 1 variance? Does the function weight_norm in pytorch actually does that?
I read paper Self Normalizing Neural Networks but only activations are normalized to N(0, 1)
And speaking of self normalizing models, does anyone made a network that explicitly output values already approximately close to softmax without calculating softmax with logits?