[D] Per channel or per sample Loss calculation and averaging in a batch ?
Let’s say we have an N-class semantic segmentation problem. Now on each iteration (for each batch) we can calculate Dice loss in two ways: (1) calculate average loss over classes for each sample in a batch and after that get the average over batch, or (2) calculate average loss per class in a batch and then average over classes presented in a batch. Which one is better and why? Or there is no difference at all? Can it affect on how model learns to segment small or big objects? Any related articles?
submitted by /u/AdelSexy
[link] [comments]