[P] My model performs best without any regularisation. What am I missing?
I’m training a neural net in Keras for the prediction of two-person sports contests. The data are therefore not time series as such, but they are time-ordered, so I’m doing walk-forward validation to calibrate model complexity.
I’ve experimented with weight decay, drop-out and L1/L2 regularisation. The model always performs best on unseen data when there is no regularisation at all. This feels intuitively wrong.
Has anyone experienced something like this before, and is there an obvious answer to why this might happen? Failing that, any tests that I can do to help diagnose the problem?