[D] Let’s say someone gives you a big, challenging, labeled dataset to train a model on. How do you tell the labels aren’t random for the most part and putting energy into training a model isn’t a waste of time?
The data could be any kind of data, but it requires an expert to annotate it correctly, and since you’re not an expert in that particular area, you can’t eye-check if the labels make sense. You also try some baseline attempts that can overfit the training data but fail hard on every validation split. How to tell at this point whether the problem is just really challenging or whether the data’s labels are bad/wrong/random?