Blog

Learn About Our Meetup

4200+ Members

[D] Besides decaying learning rate and increasing batchsize: Decay momentum? Decay droprate? Increase L2 regularization?

Decaying learning rate is a popular practice even for adaptive optimizers such as Adam. Increasing batchsize was also shown to have the same effect.
But there are other hyperparameters with similar nature.
– Does it make sense to decay/increase them?
– Have anyone tried decaying momentum, or decaying droprate, or increasing L2 regularization?
– Are there other hyperparameters that need tuning like this?

submitted by /u/thntk
[link] [comments]

Next Meetup

 

Days
:
Hours
:
Minutes
:
Seconds

 

Plug yourself into AI and don't miss a beat