Skip to main content

Blog

Learn About Our Meetup

5000+ Members

MEETUPS

LEARN, CONNECT, SHARE

Join our meetup, learn, connect, share, and get to know your Toronto AI community. 

JOB POSTINGS

INDEED POSTINGS

Browse through the latest deep learning, ai, machine learning postings from Indeed for the GTA.

CONTACT

CONNECT WITH US

Are you looking to sponsor space, be a speaker, or volunteer, feel free to give us a shout.

[D]Follow The Regularized Leader(FTRL) algorithm to model user interests (which might change over time)?

[D]Follow The Regularized Leader(FTRL) algorithm to model user interests (which might change over time)?

Hi there,

I’ve been working on news recommendation problem, and I’m implementing a factorization machine optimized by FTRL (the original paper is here: https://static.googleusercontent.com/media/research.google.com/zh-CN//pubs/archive/41159.pdf) to model user interests, in an online learning fashion. As users’ reading interests might change over time, I want the model to capture this change rapidly, hopefully. But the learning rate of FTRL decays as the following equation:

FTRL per-coordinate learning rate

learning rate decreases monotonically over time, So as training goes for some certain time, model change might become very slow, thus hard to follow the users’ interest.

What I’m trying is, do not accumulate the square of gradients from the beginning, just accumulate recent ones. In order to do this, I change to gradient accumulating line in the pseudo code to the following, where lambda is a number in (0,1), like 0.99 or something like that. In this way, I hope the gradients long ago make little contribution to the denominator.

Is there someone familiar with FTRL could tell me does this make sense, or is it valid in math? because the mathematics behind FTRL is just beyond me.

Thanks in advance : )

changed gradient accumulating style

FTRL original implementation

submitted by /u/hunter7z
[link] [comments]