[P] AdamWR Full Keras + TF-Keras Implementation Available

Written by torontoai on October 26, 2019. Posted in Reddit MachineLearning.

A followup to original post (pasted shortened below), with major changes; release v1.1:

Run-based weight decay normalization scheme, normalizing over arbitrary # of iterations independent of LR scheduler (e.g. over all epochs)
Full compatibility with TensorFlow 2.0.0 and Keras 2.3.0 (keras + tensorflow.keras)
Full compatibility with TensorFlow 1.14.0 and Keras 2.2.5 (keras + tensorflow.keras)
Also compatible w/ TF 1.13.0 & 1.15.0, Keras 2.2.3-2.2.4

For a complete list of changes, see release notes. Optimizers here.

The latest Lookahead optimizer paper, co-authored by Geoffrey Hinton, used AdamW as its base optimizer, and noted it performing superior to plain Adam.

NadamW and SGDW included, along their WR (Warm Restart) counterparts – with cosine annealing learning rate schedule, and per layer learning rate multipliers (useful for pretraining). All optimizers are well-tested, and for me have yielded 3-4% F1-score improvements in already-tuned models for seizure classification.

submitted by /u/OverLordGoldDragon
[link] [comments]

Blog

Learn About Our Meetup

5000+ Members

MEETUPS

JOB POSTINGS

CONTACT

[P] AdamWR Full Keras + TF-Keras Implementation Available