[D] Which part of the RNN architecture has the sequential memory stored ?
I was reading Andrej Karpathy’s blog on RNN to get familiarised with working of RNN, both mathematically and intuitively. From my understanding, there are three sets of parameters to optimise.
- Wxh – multiple with new input to give a hidden state
- Whh – multiply with rolling hidden state to add to it the above hidden state
- Why – multiple with the rolling hidden state to obtain the output
And we have the rolling hidden state (H) which accumulates all the information from the inputs. And we optimise on the loss calculated from the output to find the best set of above params
What I am not able to visualise and understand is the part in which the so-called sequential memory is stores ?
Is it stored in the vector H (the rolling hidden state) or the weight matrix Whh ?
In either case, could you also give some intuition on how it contains memory in the form of matrix / vector ?