[D] Biggest batch size that should be used: Biggest even number that the GPU memory can handle, or biggest power of 2 that the GPU memory can handle? Also why do GPUs love power of 2s?
I have heard that GPUs love power of 2s, and that’s why embeddings and batch sizes are often seen as some power of 2, (64, 128, 256, 512, 1024, etc).
But I never have seen a concrete explanation for why this is.
Also, should a max batch size to be considered the biggest even number that the GPU memory can handle, or the biggest power of 2 that the GPU memory can handle?
submitted by /u/BatmantoshReturns
[link] [comments]