Skip to main content

Blog

Learn About Our Meetup

5000+ Members

MEETUPS

LEARN, CONNECT, SHARE

Join our meetup, learn, connect, share, and get to know your Toronto AI community. 

JOB POSTINGS

INDEED POSTINGS

Browse through the latest deep learning, ai, machine learning postings from Indeed for the GTA.

CONTACT

CONNECT WITH US

Are you looking to sponsor space, be a speaker, or volunteer, feel free to give us a shout.

[D] Effect of Oversampling on classifiers, when combined with image transformations

I am trying to understand the negative consequences of oversampling in the context of image classification. If I am using a decent amount amount of image transformations, I believe it will effectively be equivalent to SMOTE for tabular data, since I am not exactly repeating any image in a batch. Does the behaviour and test set accuracy of a classifier in any way depend on the actual class distribution in the train set and by oversampling am I doing any harm?

To take an example I was training a classifier on a dataset with 5 classes, having heavy class imbalance. To balance it out I oversampled the minority classes so that all classes have equal number of images. This caused a significant performance drop on the test set that I have, while cross validation performance was fairly high on the oversampled set. When analysing the class distributions I saw that for the original train set the distribution was: 1,3,2,4,5 with decreasing number of samples. The test set has a class distribution 3,1,2,4,5 but the predictions after training on oversampled data have distribution 4,3,2,5,1. Mathematically speaking, how can this behaviour of over predicting a less frequently occuring class be explained?

submitted by /u/Atom_101
[link] [comments]