[D] Consistency of Impact for Selected Features Across Model Types
I’ve been experimenting with DataRobot as of late and I’ve noticed that given a fixed set of features, the choice of n (e.g. 10) “most impactful” features differs significantly from one model type to another. Given that different algorithms may have differing sensitivity to input data type and specific effects present in the training data set, how would one go about shortlisting a set of features that would be consistently impactful across a variety of algorithms?
Current train of thought has me stuck at two options: 1. Rank top n features by model impact for various models (say top 5 best performing models) and select features that show the least change in ranking; or 2. Examine pairwise correlation with target for all features and if variables with low correlation are selected in the final model, sanity check for nonlinear relationship using contour plots.
Your thoughts/comments are much appreciated.
P. S. Discussion is motivated by my model validation team that insists on me benchmarking every model against logistic regression because that’s the only one they understand. Theme of the month is “Why are the variables selected in your model significantly different from those used in our traditional LR model?”.