Skip to main content


Learn About Our Meetup

5000+ Members



Join our meetup, learn, connect, share, and get to know your Toronto AI community. 



Browse through the latest deep learning, ai, machine learning postings from Indeed for the GTA.



Are you looking to sponsor space, be a speaker, or volunteer, feel free to give us a shout.

[D] Selection of randomly generated features

I have some raw data and a list of feature descriptors. A feature descriptor defines a function with parameters including their domain. This allows me to generate almost infinite many random features. Obviously, most features are garbage. My goal is to find a subset of features to train a “good enough” model. I suspect there will be features which are usable on their own and features which are only useful in combination with other features.

My current approach is to generate n features, take k of them and train a tree-based model with it. Then I measure the model score and divide it according to the feature importance among the features. A few rounds of cross validation follow. Then I take some other k of the n features and repeat the process until all of the n features have been tested a number of times. Then I start the process with new n features.

I am aware that there is a very high chance that I will miss some great feature combinations. However, I do not see how this could be avoided. Nevertheless, I would like to improve the process. One idea I have is to randomly pick some of the previously best scored features and use them together with new features to train the model. Then at least I might discover features which support the already good features.

Do you know of similar techniques which I could use for inspiration? Or do you think I should approach the problem completely different? Any inputs are welcome.

submitted by /u/kalabele
[link] [comments]