[D] What is the best way to search for a learning rate schedule?
Learning rate schedule is the most irritating hyperparameter to search for me, because there seem to be exponential number of possibilities in when to decay and how much to decay by. What is the most systematic way to search for a good learning rate schedule? Is there a method that automatically decays the learning rate when the loss stops dropping quickly?
In my experience a “step” learning rate decay function is better than a “smooth” decay function. Is there any paper or blog that has done a large scale confirmation/rejection of this and other systematic analysis of learning rate schedules?
submitted by /u/nearning
[link] [comments]