[D] What is the latest consensus on the effect of batch size on generalization?
In general is this true?
small batch size + long training ~ large batch size + short training
Just wondering because I’m not aware of any standard literature on this topic, and if anyone knows any good papers I would appreciate some references!
submitted by /u/Minimum_Zucchini
[link] [comments]