[P] AdamWR Full Keras + TF-Keras Implementation Available
A followup to original post (pasted shortened below), with major changes; release v1.1:
- Run-based weight decay normalization scheme, normalizing over arbitrary # of iterations independent of LR scheduler (e.g. over all epochs)
- Full compatibility with TensorFlow 2.0.0 and Keras 2.3.0 (
keras
+tensorflow.keras
) - Full compatibility with TensorFlow 1.14.0 and Keras 2.2.5 (
keras
+tensorflow.keras
) - Also compatible w/ TF 1.13.0 & 1.15.0, Keras 2.2.3-2.2.4
For a complete list of changes, see release notes. Optimizers here.
The latest Lookahead optimizer paper, co-authored by Geoffrey Hinton, used AdamW as its base optimizer, and noted it performing superior to plain Adam.
NadamW and SGDW included, along their WR (Warm Restart) counterparts – with cosine annealing learning rate schedule, and per layer learning rate multipliers (useful for pretraining). All optimizers are well-tested, and for me have yielded 3-4% F1-score improvements in already-tuned models for seizure classification.
submitted by /u/OverLordGoldDragon
[link] [comments]