[D] Question on Quaternion-Recurrent-Neural-Networks
I understand that the forward pass needs to be adapted for QRNN; but what’s unclear to me:
Why do they derive a backprop algorithm especially for quaternions? Shouldn’t the automatic differentiation frameworks like Pytorch or Tensorflow do that automatically, at least in the case when the activation function is as they used it? In the final end, I guess everything is implemented by “real”-valued matrices and if the forward pass is according to quaternionic multiplication, why would an automatically derived gradient not work? So my question would be is there any difference of their backward pass to pytorch derived backward pass, apart from computational speed or memory consumption? If I understand correctly, in the copy-task of the github repo (https://github.com/Orkis-Research/Quaternion-Recurrent-Neural-Networks) even a pytorch backward pass is used by default?
Thanks for any answer or intuition that either confirms what I think or contradicts me.