[D] How would one detect data leakage in someone else’s model?
Pure hypothetical. Let’s say I have someone’s model (i.e. their final model weights) and also their train and test set. I don’t have any additional validation readily available.
What kind of heuristics can be used to evaluate if there was data leakage from the test set?
I’d like to distinguish the two cases 1) there is data leakage, 2) the model is really good. Based on just performance metrics on the test/train set, I dont feel like I can distinguish these two cases. Would it be impossible to tell without additional validation data?