[D]ual Numbers and Automatic Differentiation explained
For those struggling to understand AD and dual numbers, this article might help :)!
If I understand correctly, AD essentially computes an derivative of each operator using — for example, dual numbers — analytically exact methods, and combine them using the chain rule. This is feasible because programmable functions are defined using “primitive” operators, and we can compute their derivatives algorithmically (or analytically).
noticed the author has another multivariate example.
This was also helpful.