[R] CVPR 2019 Noise-Tolerant Training work `Learning to Learn from Noisy Labeled Data ‘
This work achieves promising results with meta-learning. Our result on Clothing 1M is comparable with theirs. However, their modelling via meta-learning seems extremely complex in practice.
Too many hyper-parameters shown in their Algorithm 1 and implementation section 4.2:
- The number of synthetic mini-batches (meta-training iterations) M;
- Meta-training step size alpha;
- Meta-learning rate eta;
- Student learning rate beta;
- Exponential moving average (EMA) decay gamma;
- The threshold for data filtering tau;
- The number of samples with label replacement, rho;
The strategies of iterative training together with iterative data filtering/cleaning, reusing last-round best model as mentor, etc., make it difficult to handle in practice.
However, the ideas are interesting and novel:
- Oracle/Mentor (Consistency loss): To make meta-test reliable, the teacher/mentor model should be reliable and robust to real noisy examples. Therefore, they apply iterative training and iterative data cleaning to make the meta-test consistency loss reliable and an optimisation oracle against real noise.
- Unaffected by synthetic noise: The meta-training sees synthetic noisy training examples. After training on them, the meta-testing evaluates its consistency with oracle and aims to maximise the consistency, i.e., making it unaffected after seeing synthetic noise.
Is meta-learning really a good solution in practice with such many configurations?
Or could we simplfiy its modelling to make it easier in practice?