[D] Increasing regularization during deep network training
I recently read a paper that suggested increasing weight decay, dropout rate, etc. (i.e., regularization parameters) while a deep network was training to avoid overfitting; however, I cannot remember the name of the paper. I tried to search through the literature, but searching using terms like “increase regularization deep learning” hasn’t turned up much (unsurprisingly).
I did find Curriculum Dropout, which suggests increasing the dropout rate during training, but I don’t believe this is the paper I had in mind.
Anyone happen to know of other papers discussing this subject? Are there any appearing trends surrounding changing regularization parameters during training? Anyone have any experience testing this idea out?