[D] Xavier glotrot initialization intuition
https://i.ytimg.com/vi/OAb_p-SXSeM/maxresdefault.jpg
If input/output layers get bigger, the range [-x,x] and standard deviation decreases, wouldn’t this make the initialized nodes be more similar? Wouldn’t it make sense to spread out the initialization of many nodes instead of making them less spread out?
submitted by /u/__BetterAgent__
[link] [comments]