[D] Learning with “noisy data” (but perfect labels)
There are many works that deal with noisy labels, but has the problem of unreliable data (but reliable labels) been studied? In other words, problems where the data to be classified is imperfect and not always sufficient to determine the class label.
An example would be a model that predicts the city in which a photo was taken. Ground truth labels would be perfect thanks to GPS metadata. If the photo contains the Eiffel Tower, we can predict that the city is Paris. But many pictures contain no useful information; for example a photo of a dog or a McDonald’s is nearly useless for determining the city.
How best to train a classifier when such “noisy examples” (for lack of a better term) are very common?