[P] Learning Rate Dropout in PyTorch
I just implemented learning rate dropout using PyTorch! This technique applies dropout to the weight update at each iteration instead of the weights themselves.
I welcome any and all feedback! I ran four trials with a ResNet34 model on CIFAR-10 using both the baseline optimizer (SGD with momentum) and this variant. I wasn’t able to achieve the numbers reported in the paper, though. Feel free to double-check the masking logic or hyperparameters in case that explains the difference.