[D] Why re-sampling imbalanced data isn’t always the best idea
I often times work with people (medical studies) with a huge “knowledge” on statistical methods but none of the required basics or understanding what goes on inside some algorithms. That’s perfectly fine because after all that’s not their job but mine.
But over time, I’ve come across a few problems where (due to not finding the “needed significance”) some really basic over-sampling was applied. I’ve thrown together a really simple example, that anyone should be able to follow (without any deep statistical knowledge) to showcase what could happen – maybe it helps you or you can use it to your help: