[D]Follow The Regularized Leader(FTRL) algorithm to model user interests (which might change over time)?
I’ve been working on news recommendation problem, and I’m implementing a factorization machine optimized by FTRL (the original paper is here: https://static.googleusercontent.com/media/research.google.com/zh-CN//pubs/archive/41159.pdf) to model user interests, in an online learning fashion. As users’ reading interests might change over time, I want the model to capture this change rapidly, hopefully. But the learning rate of FTRL decays as the following equation:
learning rate decreases monotonically over time, So as training goes for some certain time, model change might become very slow, thus hard to follow the users’ interest.
What I’m trying is, do not accumulate the square of gradients from the beginning, just accumulate recent ones. In order to do this, I change to gradient accumulating line in the pseudo code to the following, where lambda is a number in (0,1), like 0.99 or something like that. In this way, I hope the gradients long ago make little contribution to the denominator.
Is there someone familiar with FTRL could tell me does this make sense, or is it valid in math? because the mathematics behind FTRL is just beyond me.
Thanks in advance : )