[D] Tips on improving random forest predictive accuracy when # of features is really low?
Working on a random forest predictive model with a continuous response variable and two continuous features. Normally when I do RF projects I use some sort of feature selection method to choose which features to use. Then I fit the RF model onto those features. Then to test accuracy / related metrics I use cross validation, confusion matrices, etc.
However in this case I only have two given features. I don’t want to just literally run a RF model on those two features as my whole entire project. I’m thinking gradient boosting is what I should learn? Also I think I should play around with the number of estimators and depth of the RF. I’m using sklearn in Python if that helps.
Any other suggestions? Obviously this type of problem/challenge is an unexplored area for me, so looking for best practices on how to add to my data science toolkit. Thanks!
submitted by /u/truryce
[link] [comments]