[D] Can someone answer some questions based on decision trees?
So I understand that you don’t need to scale features for a decision tree, since it will just find the right place to split on anyway. Do outliers need to be handled though? Does it matter if I have some features that have a way higher variance within it compared to others? Lastly, for categorical variables, I have my observations labeled in one of 5 labels (this is a feature, not the target). Will it adversely affect the results if the labels are imbalanced? I have one label that makes up roughly 60% of all the observations. Should I try to relabel things so it’s a little more balanced (I can collapse some of the other labels) Btw these questions are for either classification or regression trees.
submitted by /u/mydogissnoring