[D] Adding more data will make the model perform worse ?
Hi, I am using XGboost regressor for a personal project. Initially I used a data set with measurements from 01.Jan.2016 to 24.Dec.2018 and I got those results : MAE = 2.332 , MSE = 7.764 for testing data. I recently got from the same source, the same data set but with measurements from 01.Jan.2016 up to 14.May.2019 and for testing data I got those results : MAE = 2.729 , MSE = 12.002. I have tuned the hyperparameters, in both cases, using the same method through cv. I tried to adjust the parameters more for the second data set but I did not get better results. Even if the differences are not very high, the fact that I used a larger data set could have affected the performance or is something I have overlooked?
submitted by /u/Bigdey
[link] [comments]