Blog

Learn About Our Meetup

5000+ Members

MEETUPS

LEARN, CONNECT, SHARE

Join our meetup, learn, connect, share, and get to know your Toronto AI community. 

JOB POSTINGS

INDEED POSTINGS

Browse through the latest deep learning, ai, machine learning postings from Indeed for the GTA.

CONTACT

CONNECT WITH US

Are you looking to sponsor space, be a speaker, or volunteer, feel free to give us a shout.

[R] Why not use e.g. SGD coordinate-wise: learning rate ~ sqrt(variance(theta)/variance(g)) ?

Working on estimating position of minimum by modelling where linear trend of gradients interests zero, simple approximation (corr(g,theta)=1) leads to looking obvious:

learning rate ~ sqrt(var(theta)/var(g))

proportional to width of displacement of theta, and inversely proportional to width of displacement of gradients – assuming they are in line (corr(g,theta)=1), such learning rate would take us exactly to g=0 minimum of parabola in one step.

Adaptive variance estimation is just a matter of maintaining two exponential moving averages: of value and of value2, hence we can e.g. cheaply do it coordinate-wise in SGD – getting 2nd order adaptation of learning rate independently for each coordinate (5th page here).

There is popular square root of mean gradient2 in denominator (e.g. RMSprop, ADAM), but have anybody seen use of variance in SGD optimizers?

submitted by /u/jarekduda
[link] [comments]

Next Meetup

 

Days
:
Hours
:
Minutes
:
Seconds

 

Plug yourself into AI and don't miss a beat

 


Toronto AI is a social and collaborative hub to unite AI innovators of Toronto and surrounding areas. We explore AI technologies in digital art and music, healthcare, marketing, fintech, vr, robotics and more. Toronto AI was founded by Dave MacDonald and Patrick O'Mara.