[D] GELU better than RELU?
I stumbled across a paper today from 2016 which presents reasonable evidence that Gaussian error linear units (GELU) perform better than RELU.
https://arxiv.org/pdf/1606.08415.pdf
I have a couple ideas about why I’ve never heard of this before and I’m curious what others think.
submitted by /u/AbitofAsum
[link] [comments]