[D] BatchNorm alternatives 2019
The main reason why people BatchNorm despite being compute heavy (~25% of total model) is because of fast official cudnn implementations. Same reason why RNNs other than LSTM and GRU never went popular.
Also, BatchNorm requires computing square root and division which require full precision to work properly. Going half-precision or applying quantization is not easy.
Anyway, are there any new methods that can dethrone BatchNorm entirely? Some papers:
Generalized Hamming Network https://arxiv.org/abs/1710.10328