[D] Layer Complexity of Recurrent NNs in the Transformer Paper
https://arxiv.org/pdf/1706.03762.pdf Table 1 of this paper says the layer complexity of self-attention NNs is N^2*d, which I understand.What I dont understand is the complexity of Recurrent NNs, which seems to be d^2*N. Does anyone know how this comes to be?
submitted by /u/MichaelStaniek
[link] [comments]