[P] AdamWR Keras Full Implementation Available
The latest Lookahead optimizer paper, co-authored by Geoffrey Hinton, used AdamW as its base optimizer, and noted it performing superior to plain Adam. To the best of my knowledge, no complete implementation of AdamW in Keras existed – until now, by me:
It includes NadamW and SGDW, and their WR (Warm Restart) counterparts – with cosine annealing learning rate schedule, and per layer learning rate multipliers (useful for pretraining). All optimizers are well-tested, and for me have yielded 3-4% F1-score improvements in already-tuned models for seizure classification. Up to date with Keras 2.3.0.
I recommend giving it a go. Any feedback is welcome.