Is there a way to train a scikit classifier to make one prediction per N samples? [Project]
So, I originally posted this on StackOverflow, but I was told that my question was “too broad” and my thread was closed.
I’m working on replicating the research done in this paper.
I have a pandas DF which looks like this:
Date In1 In2 In3 ... Out Day1 -1 1 -1 -1 Day2 1 -1 1 1 Day3 -1 1 -1 -1 Day4 -1 1 1 1 Day5 1 1 1 1 ...
Now, I’ve already done what they did in the paper. Which is to say, I’ve trained multiple models in scikit to predict "Out"
based on all the feature columns "In1", ..., "In10"
.
However, these are daily predictions and I wanna see what would happen if I make weekly predictions.
Essentially, I want to use df.loc[Day1:Day5, In1:In10]
to predict df.loc[Day5, "Out"]
.
Of course, "Out"
would be redefined as cumulative returns over the last 5 days, rather than what it currently is i.e. daily returns.
The problem is, I have absolutely no idea how to go about making a single prediction with N samples. (in this case 5)
My X_train/X_test are DataFrames with the "Out"
column dropped & my y_train/y_test is a Series of the "Out"
column. I prefer this because I’m not entirely comfortable with arrays.
Is there a way to make scikit use N samples for a single prediction?
submitted by /u/JebusWasAnAlien
[link] [comments]