Skip to main content

Blog

Learn About Our Meetup

5000+ Members

MEETUPS

LEARN, CONNECT, SHARE

Join our meetup, learn, connect, share, and get to know your Toronto AI community. 

JOB POSTINGS

INDEED POSTINGS

Browse through the latest deep learning, ai, machine learning postings from Indeed for the GTA.

CONTACT

CONNECT WITH US

Are you looking to sponsor space, be a speaker, or volunteer, feel free to give us a shout.

[D] Ideas and advice how to improve accuracy score using Random Forest and Extra Trees classifier.

My project is classification of ultrasound 2D images, the size of the full data set is approximately 1000 images. For this analysis 250 features were handcrafted by calculating different parameters of the whole images, or horizontal slices of the images. For features selection Kbest with chi2 is used to select the best 50 features. To calculate balanced accuracy I am using sklearn.model_selection.cross_val_score, and Random Forest and Extra Trees (1000 trees). What confuses me is that when I split the data with train_test_split randomly with 9:1 ratio, and use cross_val_score only on 90% of the data the highest accuracy score is 80% with random forest, and 85% with extra trees. But when I don’t apply train_test_split and calculate balanced accuracy score on the full data set, the highest score is not higher than 60%. I expected to get better results when I included more data, but opposite happened. I would appreciate any advice or idea, how to improve the accuracy score.

submitted by /u/glitchdot2
[link] [comments]