[D] Using A Classifier’s Feature Importance Output To Approximate a Choice Model and Rank Priority of Features
Was wondering if anybody had ever done this. I’ve searched quite a bit but haven’t come up with anything.
I have a bunch of hotel data with different features (amenities like a pool, workout room, number of rooms, a whole bunch of others) as well as a bunch of results of people who chose hotel A vs B, etc. I need to figure out which “features” played prominently in people choosing certain hotels over others. Classic customer choice.
Not sure a discrete choice model is the best for this, though I’m exploring it- but, essentially, I’m trying to figure out if I can approach this as a supervised learning problem in order to figure out which of these features figure most highly into choices of hotels.
Having looked at classifier feature importance in the past on a lot of projects (mainly xgboost) if I were able to get creative and properly vectorize lists of features in my input set between hotels, in a way that would allow me to train a classifier on the choices, I could then look at which features were most important as a measure of choice forecasting.
Assuming the data has cohesive predictive characteristics, would this approach make sense? Or is this a bastardization of how feature importance is supposed to work?