Skip to main content


Learn About Our Meetup

5000+ Members



Join our meetup, learn, connect, share, and get to know your Toronto AI community. 



Browse through the latest deep learning, ai, machine learning postings from Indeed for the GTA.



Are you looking to sponsor space, be a speaker, or volunteer, feel free to give us a shout.

[D] Question pertaining to different methods of feature selection

I have been working on a binary classification problem the past few months with over 200 features (all scalar data). I’m using dense neural networks with Keras. I obviously want to trim this down and have been researching didn’t tools to assist with feature selection.

So far I’ve used the K-S test, but I’m weary that a feature having different distributions between classes doesn’t necessarily mean that it will help the network differentiate between the 2 classes (or maybe it does? Unsure about this.)

While perusing Kaggle the other day I came across a user using the feature importances attribute from sklearn.ensemble.RandomForestClassifier. I haven’t been able to find out how exactly this works – does anyone know how this works and whether or not the feature importance for a Random Forest Classifier would also translate to use with neural networks?

Another thing I have been playing around with is just feeding every feature into the network and using dropout to have the network to select which features to use or discard.

Anyways, does anyone have any knowledge/advice about the merits or drawbacks of any of these strategies, or any advice for a different feature selection strategy to use? Thanks!

submitted by /u/Gkg14
[link] [comments]