[D] Question pertaining to different methods of feature selection
I have been working on a binary classification problem the past few months with over 200 features (all scalar data). I’m using dense neural networks with Keras. I obviously want to trim this down and have been researching didn’t tools to assist with feature selection.
So far I’ve used the K-S test, but I’m weary that a feature having different distributions between classes doesn’t necessarily mean that it will help the network differentiate between the 2 classes (or maybe it does? Unsure about this.)
While perusing Kaggle the other day I came across a user using the feature importances attribute from sklearn.ensemble.RandomForestClassifier. I haven’t been able to find out how exactly this works – does anyone know how this works and whether or not the feature importance for a Random Forest Classifier would also translate to use with neural networks?
Another thing I have been playing around with is just feeding every feature into the network and using dropout to have the network to select which features to use or discard.
Anyways, does anyone have any knowledge/advice about the merits or drawbacks of any of these strategies, or any advice for a different feature selection strategy to use? Thanks!