[D] State Of The Art Activation Function: GELU, SELU, ELU, ReLU and more. With visualization of the activation functions and their derivatives.
(Intermediate level and above: Probably skip at least the two first headers, to ReLU)
I recently did a long-form post explaining and visualizing the various activation functions. The math is not that complicated, but knowing the ups and downs of each of these activation functions, or just knowledge of their existence, could prove its worth.
Any feedback is appreciated. As I’m sharing what I learn, I create for other people to learn as well. This is not any advanced topic, but it does provide an overview of SOTA activation functions – and to this extent, the plan is to make similar posts for more advanced topics in the future.