Join our meetup, learn, connect, share, and get to know your Toronto AI community.
Browse through the latest deep learning, ai, machine learning postings from Indeed for the GTA.
Are you looking to sponsor space, be a speaker, or volunteer, feel free to give us a shout.
In his award-winning Neural Network overview (yes, he won the first best paper award of this journal), Schmidhuber discusses the LSTM here (https://arxiv.org/pdf/1404.7828.pdf, p. 19) as follows:
“The basic LSTM idea is very simple. Some of the units are called Constant Error Carousels (CECs). Each CEC uses as an activation function f, the identity function, and has a connection to itself with fixed weight of 1.0. Due to f’s constant derivative of 1.0, errors backpropagated through a CEC cannot vanish or explode (Sec. 5.9) but stay as they are (unless they “flow out” of the CEC to other, typically adaptive parts of the NN).”
What does Schmidhuber mean there? Where is the fixed weight 1.0 and the identity function as the activation function? Can somebody relate this to the common LSTM equations, for example in https://colah.github.io/posts/2015-08-Understanding-LSTMs/ ?
submitted by /u/ManiacMalcko
[link] [comments]