Skip to main content


Learn About Our Meetup

5000+ Members



Join our meetup, learn, connect, share, and get to know your Toronto AI community. 



Browse through the latest deep learning, ai, machine learning postings from Indeed for the GTA.



Are you looking to sponsor space, be a speaker, or volunteer, feel free to give us a shout.

Is there a way to train a scikit classifier to make one prediction per N samples? [Project]

So, I originally posted this on StackOverflow, but I was told that my question was “too broad” and my thread was closed.

I’m working on replicating the research done in this paper.

I have a pandas DF which looks like this:

Date In1 In2 In3 ... Out Day1 -1 1 -1 -1 Day2 1 -1 1 1 Day3 -1 1 -1 -1 Day4 -1 1 1 1 Day5 1 1 1 1 ... 

Now, I’ve already done what they did in the paper. Which is to say, I’ve trained multiple models in scikit to predict "Out" based on all the feature columns "In1", ..., "In10".

However, these are daily predictions and I wanna see what would happen if I make weekly predictions.

Essentially, I want to use df.loc[Day1:Day5, In1:In10] to predict df.loc[Day5, "Out"].

Of course, "Out" would be redefined as cumulative returns over the last 5 days, rather than what it currently is i.e. daily returns.

The problem is, I have absolutely no idea how to go about making a single prediction with N samples. (in this case 5)

My X_train/X_test are DataFrames with the "Out" column dropped & my y_train/y_test is a Series of the "Out" column. I prefer this because I’m not entirely comfortable with arrays.

Is there a way to make scikit use N samples for a single prediction?

submitted by /u/JebusWasAnAlien
[link] [comments]