[D] Overfitting vs. Generalization – a subtle difference
In my view, overfitting does not necessarily imply lack of generalization, just as well as generalization cannot be directly associated to degree of overfitting.
An overfit model is a model that is tuned to generate the highest performance (e.g. lowest loss) on the dataset it was trained with. This can be tested by the difference between the losses on the validation set and on the training set. In order to test for overfitting, training and validation sets should have similar distributions. If that’s the case, an overfit model will deviate in performance on the validation set from the training performance. This is because, even if the distributions are similar, the model is tuned to pick up correctly only the samples it has seen on the training set.
As for generalization, it can only be evaluated between datasets (test and training) that have different distributions. Ideally, the test distribution will be the most heterogeneous of them all. In my opinion, this is the only way to really assess generalization: the difference between the losses on training versus testing set.
TLDR: Overfitting is indicated by when model underperforms on unseen data with similar distributions to seen data. Generalization, on the other hand, is indicated by the performance differences between seen and unseen data with different distributions, where the unseen data ideally represents real world distributions.
I think this is a misconception most have, even in industry.
What are your thoughts?