[Project] Implementing Improvements to Hypergradient optimizers
We try improvements to the existing ‘Hypergradient’ based optimizers proposed in the paper Online Learning Rate Adaptation with Hypergradient Descent.
We expect that the hypergradient based learning rate update could be more accurate and aim to exploit the gains much better by boosting the learning rate updates with momentum and adaptive gradients, experimenting with
- Hypergradient descent with momentum, and
- Adam with Hypergradient,
alongside the model optimizers SGD, SGD with Nesterov(SGDN) and Adam.
The new optimizers are compared with their resepective hypergradient-descent baselines and provide advantages such as better generalization and faster convergence for the loss function. The code and the results of our experiments are available at https://github.com/harshalmittal4/Hypergradient_variants.