[D] How to handle noisy training labels in supervised learning?
In machine learning, it is often the case that training labels are subject to noise such as mislabelling. For neural networks that require large quantities of training data, this manifests as a trade-off between dataset quality and quantity. For instance, a model may have good performance on a training set (with noisy labels), but when we evaluate on a manually annotated test set, the model appears to generalize poorly.
What are some ways a machine learning practitioner can better deal with this problem?