[D] How much of an effect, if any, does batch size have when doing hyperparameter optimization?
I have been using sci-kit optimize to do hyperparameter search (using gp_minimize specifically) for a neural network. I am working on a binary classification problem with a significant class imbalance. I have been using a batch size of 10, but just came across a tweet and notebook by Francois Chollet where he recommended using a high batch size in class imbalance problems in order so that each batch contains at least a few positive examples.
My question is can I just take the networks with the best network architectures I found via my hyperparameter search where I used a batch size of 32, but just retrain them using the same hyperparameters but using a higher batch size?
Or, would batch size have a significant effect on hyperparameter optimization, and I would be better off just redoing hyperparameter optimization but this time with a larger batch size?
Going off of that, any recommendations for how to select batch size? My data contains between 400,000 – 500,000 samples, and I’m feeding in 7 features to the network.
On a similar not of dealing with class imbalance problems – my sample data is weighted to begin with (I am working with a physics problem and the weights for each sample is the probability that that sample will occur), but I was thinking about increasing the weights of the positive data points to maybe help minimize the effect of class imbalance. Thoughts on this?
I hope my question(s) makes sense, thanks for any help!