Is there a way to train a scikit classifier to make one prediction per N samples? [Project]
So, I originally posted this on StackOverflow, but I was told that my question was “too broad” and my thread was closed.
I’m working on replicating the research done in this paper.
I have a pandas DF which looks like this:
Date In1 In2 In3 ... Out Day1 -1 1 -1 -1 Day2 1 -1 1 1 Day3 -1 1 -1 -1 Day4 -1 1 1 1 Day5 1 1 1 1 ...
Now, I’ve already done what they did in the paper. Which is to say, I’ve trained multiple models in scikit to predict "Out" based on all the feature columns "In1", ..., "In10".
However, these are daily predictions and I wanna see what would happen if I make weekly predictions.
Essentially, I want to use df.loc[Day1:Day5, In1:In10] to predict df.loc[Day5, "Out"].
Of course, "Out" would be redefined as cumulative returns over the last 5 days, rather than what it currently is i.e. daily returns.
The problem is, I have absolutely no idea how to go about making a single prediction with N samples. (in this case 5)
My X_train/X_test are DataFrames with the "Out" column dropped & my y_train/y_test is a Series of the "Out" column. I prefer this because I’m not entirely comfortable with arrays.
Is there a way to make scikit use N samples for a single prediction?
submitted by /u/JebusWasAnAlien
[link] [comments]