[D] Effect of Oversampling on classifiers, when combined with image transformations
I am trying to understand the negative consequences of oversampling in the context of image classification. If I am using a decent amount amount of image transformations, I believe it will effectively be equivalent to SMOTE for tabular data, since I am not exactly repeating any image in a batch. Does the behaviour and test set accuracy of a classifier in any way depend on the actual class distribution in the train set and by oversampling am I doing any harm?
To take an example I was training a classifier on a dataset with 5 classes, having heavy class imbalance. To balance it out I oversampled the minority classes so that all classes have equal number of images. This caused a significant performance drop on the test set that I have, while cross validation performance was fairly high on the oversampled set. When analysing the class distributions I saw that for the original train set the distribution was: 1,3,2,4,5 with decreasing number of samples. The test set has a class distribution 3,1,2,4,5 but the predictions after training on oversampled data have distribution 4,3,2,5,1. Mathematically speaking, how can this behaviour of over predicting a less frequently occuring class be explained?