[D] Ideas and advice how to improve accuracy score using Random Forest and Extra Trees classifier.
My project is classification of ultrasound 2D images, the size of the full data set is approximately 1000 images. For this analysis 250 features were handcrafted by calculating different parameters of the whole images, or horizontal slices of the images. For features selection Kbest with chi2 is used to select the best 50 features. To calculate balanced accuracy I am using sklearn.model_selection.cross_val_score, and Random Forest and Extra Trees (1000 trees). What confuses me is that when I split the data with train_test_split randomly with 9:1 ratio, and use cross_val_score only on 90% of the data the highest accuracy score is 80% with random forest, and 85% with extra trees. But when I don’t apply train_test_split and calculate balanced accuracy score on the full data set, the highest score is not higher than 60%. I expected to get better results when I included more data, but opposite happened. I would appreciate any advice or idea, how to improve the accuracy score.
submitted by /u/glitchdot2
[link] [comments]