Skip to main content


Learn About Our Meetup

5000+ Members



Join our meetup, learn, connect, share, and get to know your Toronto AI community. 



Browse through the latest deep learning, ai, machine learning postings from Indeed for the GTA.



Are you looking to sponsor space, be a speaker, or volunteer, feel free to give us a shout.

[D] Handling Lag Features for different time frames.

[D] Handling Lag Features for different time frames.


I’m currently working on a project which involves a sort of time series problem which I transformed to a classification problem for more detailed prediction, i.e. rather than having an aggregated figure at the end of the day in the time series modeling of the problem, I rather classify single instances which eventually depict the figure of the time series modeling when aggregated.

To summarize, the problem setting is actually a scheduling problem where a employee is assigned to a shift and the prediction is whether employees will be absent or not for the respective, scheduled shift.

Anyway, I try to train two different models which should be used at two different ponts in time. One is basically a 24h model which should predict instances scheduled for the next day and one model which should predict the very same instances one week beforehand. Below, I tried to illustrate the problem on a time line, hope this helps.

I started with the former model which seems a bit easier, as all information that can be available, are available at prediction point for this model. I did some feature engineering which mostly includes lag features that is over the last instances that are recorded. Since the lag features seems to contribute quite well to the model’s performance, I actually wanted to re-use them in the ‘one-week’ model. However, I face the problem that I don’t know how to calculate them accurately (if that makes even sense in this case).

As you can see in the second time line, between the prediction point and the time where the instance is scheduled I’d like to predict, there’s a gap of a week where there could potentially be more scheduled instances. I’m not sure how to deal with it. If I was ignoring the gap week completely and keep on calculating the lag features in the same sense as in the 24h model, I feel that this will not work out quite well (although I haven’t tried it yet).

Unfortunately, I couldn’t find any literature on this problem or some sort of kaggle competition where this problem was also faced. Therefore, I don’t have any ideas how to handle it and would appreciate any kind of ideas from you guys.

Thanks very much!

submitted by /u/babuunn
[link] [comments]