Skip to main content

Blog

Learn About Our Meetup

5000+ Members

MEETUPS

LEARN, CONNECT, SHARE

Join our meetup, learn, connect, share, and get to know your Toronto AI community. 

JOB POSTINGS

INDEED POSTINGS

Browse through the latest deep learning, ai, machine learning postings from Indeed for the GTA.

CONTACT

CONNECT WITH US

Are you looking to sponsor space, be a speaker, or volunteer, feel free to give us a shout.

[D] Retrain your models, the Adam optimizer in PyTorch was fixed in version 1.3

I have noticed a small discrepancy between theory and the implementation of AdamW and in general Adam. The epsilon in the denominator of the following Adam update should not be scaled by the bias correction (Algorithm 2, L9-12). Only the running average of the gradient (m) and squared gradients (v) should be scaled by their corresponding bias corrections.

In the current implementation, the epsilon is scaled by the square root of bias_correction2
. I have plotted this ratio as a function of step given beta2 = 0.999
and eps = 1e-8
. In the early steps of optimization, this ratio slightly deviates from theory (denoted by the horizontal red line)

See more here: https://github.com/pytorch/pytorch/pull/22628

submitted by /u/Deepblue129
[link] [comments]