[D] DeepMind TFWs multi speed RNN implementation, and Their use of Zt
I am trying to re implement the FTWs multi speed RNN implementation proposed in this Paper Section 2.1 (from DeepMind). Their use of Zt, is quite confusing.
It seeams like something I need to take care of during loss and gradient calculation. But then in Figure S10, it is clearly a part of the NN Model. Or is Zt just the output from the fast RNN ?
Has someone an Idea on how to implement this?