Skip to main content

Blog

Learn About Our Meetup

5000+ Members

MEETUPS

LEARN, CONNECT, SHARE

Join our meetup, learn, connect, share, and get to know your Toronto AI community. 

JOB POSTINGS

INDEED POSTINGS

Browse through the latest deep learning, ai, machine learning postings from Indeed for the GTA.

CONTACT

CONNECT WITH US

Are you looking to sponsor space, be a speaker, or volunteer, feel free to give us a shout.

[D] Question pertaining to different methods of feature selection

I have been working on a binary classification problem the past few months with over 200 features (all scalar data). I’m using dense neural networks with Keras. I obviously want to trim this down and have been researching didn’t tools to assist with feature selection.

So far I’ve used the K-S test, but I’m weary that a feature having different distributions between classes doesn’t necessarily mean that it will help the network differentiate between the 2 classes (or maybe it does? Unsure about this.)

While perusing Kaggle the other day I came across a user using the feature importances attribute from sklearn.ensemble.RandomForestClassifier. I haven’t been able to find out how exactly this works – does anyone know how this works and whether or not the feature importance for a Random Forest Classifier would also translate to use with neural networks?

Another thing I have been playing around with is just feeding every feature into the network and using dropout to have the network to select which features to use or discard.

Anyways, does anyone have any knowledge/advice about the merits or drawbacks of any of these strategies, or any advice for a different feature selection strategy to use? Thanks!

submitted by /u/Gkg14
[link] [comments]