[D] Recurrent networks without unrolling/data duplication
So, today my company decided to send me and a bunch of people onto one of those machine learning courses that are trendy these days. The subject was on time series prediction, and the structure followed everything I’ve seen in the past few years and that always leaves me baffled.
For every lag they want to use they create a new column with shifted data! To me this seems to be a dealbreaker for 2 reasons. First it effectively multiplies the memory usage by the length of the history we wish to consider. Second it means that we can’t keep history longer than the lags considered, as the internal state doesn’t transfer between lines of the data matrix (I think I remember keras having something to keep this state but it required lining examples between batches). I don’t see how this can possibly work when you have high resolution data and care about microstructure but also need to take into account longer term dynamics.
So, what am I missing here? When I’m working with things like exponentially weighted moving averages I can trivially create a statsmodels model to fit the decay rate and find the optimal history length. Is there really no straightforward way to use tensorflow/pytorch to look back into values earlier in the same data column and skip the whole data duplication issue? If not, is there a good reason for it?