[D] LSTM – Constant Error Carrousel
In his award-winning Neural Network overview (yes, he won the first best paper award of this journal), Schmidhuber discusses the LSTM here (https://arxiv.org/pdf/1404.7828.pdf, p. 19) as follows:
“The basic LSTM idea is very simple. Some of the units are called Constant Error Carousels (CECs). Each CEC uses as an activation function f, the identity function, and has a connection to itself with fixed weight of 1.0. Due to f’s constant derivative of 1.0, errors backpropagated through a CEC cannot vanish or explode (Sec. 5.9) but stay as they are (unless they “flow out” of the CEC to other, typically adaptive parts of the NN).”
What does Schmidhuber mean there? Where is the fixed weight 1.0 and the identity function as the activation function? Can somebody relate this to the common LSTM equations, for example in https://colah.github.io/posts/2015-08-Understanding-LSTMs/ ?