I trained my data by different techniques: dimensional analysis (DA), support vector machine (SVM), multi-layer perceptron neural network (ANN) and XGBoost (XGB).The lowest RMSE achieved in the testset was by XGB. However when I tried some combinations of input data the curve was different from other models.

For example:

This phenomenon is theoretically expressed as a power of type y(x) = a * x ^ b. What are the possible causes for the model curve predicted by XGB not to follow the other models?


(i) unbalanced continuous data of x or even y (target)? Histogram in

(ii) hyperparameters (a good mapping of max_depth, min_child_weight, gamma, eta and adding regularizer parameters was tested)

(iii) nature of decision tree conditions

Is there a way I can better generalize (as a post-prune) my model to fit it?

Many thanks!

submitted by /u/drainbamagex
