[D] Neural Differential Equations
I had a question about these. I know that you calculate the whole network in one go and then you just evaluate it at some points along the the depth. However, I was wondering how the parameters work.
1) How are the weights and biases updated? I know they are “shared” through the whole network (and hence less parameters than the usual network) however, how do the individual evaluations work then? For the network. Like say the network is defined from t = 0 to t = 5, and I evaluate at t = 1 and t = 2; are the weights the same here and the only thing that changes is t? And if so, what’s the point even? Why not evaluate just at the end point (i.e. the maximum depth you want) ?
2) Going off of that, what is the point of those in between evaluations if the parameters are shared anyway? Wouldn’t they be updated the same way every time? Or is it that.. multiple evaluations means that the derivatives and the updates are “better”?
I’m just really confused about this whole shared parameters thing. Please help!