[D] Jurgen Schmidhuber on Seppo Linnainmaa, inventor of backpropagation in 1970
everybody in deep learning is using backpropagation, but many don’t know who invented it, the blog has a separate web site on this which says
Its modern version (also called the reverse mode of automatic differentiation) was first published in 1970 by Finnish master student Seppo Linnainmaa
In the course of many trials, Seppo Linnainmaa’s gradient-computing algorithm of 1970 [BP1], today often called backpropagation or the reverse mode of automatic differentiation is used to incrementally weaken certain NN connections and strengthen others, such that the NN behaves more and more like the teacher
Jurgen’s scholarpedia article on deep learning also cites an earlier paper by Kelley (Gradient Theory of Optimal Flight Paths, 1960) which already had the recursive chain rule for continuous systems, and papers by Bryson 1961 and Dreyfus 1962:
BP’s continuous form was derived in the early 1960s (Kelley, 1960; Bryson, 1961; Bryson and Ho, 1969). Dreyfus (1962) published the elegant derivation of BP based on the chain rule only.
however, that was not yet Seppo Linnainmaa’s
explicit, efficient error backpropagation (BP) in arbitrary, discrete, possibly sparsely connected, NN-like networks
BP’s modern efficient version for discrete sparse networks (including FORTRAN code) was published by Linnainmaa (1970). Here the complexity of computing the derivatives of the output error with respect to each weight is proportional to the number of weights. That’s the method still used today.
Nick Trefethen  listed automatic differentiation as one of the 30 great numerical algorithms of the last century… Seppo Linnainmaa (Lin76) of Helsinki says the idea came to him on a sunny afternoon in a Copenhagen park in 1970…
starting on page 391, Griewank’s survey explains in detail what Linnainmaa did, it’s really illuminating
Gerardi Ostrowski came a tad too late, he published reverse mode backpropagation in 1971, in German, one year after Linnainmaa, hey, publish first or perish
the scholarpedia article also says:
Dreyfus (1973) used BP to change weights of controllers in proportion to such gradients.
later Paul Werbos was the first to apply this to neural networks, not in 1974, as some say, but in 1982:
Werbos (1982) published the first application of BP to NNs, extending thoughts in his 1974 thesis, which did not yet have Linnainmaa’s modern, efficient form of BP.
Jurgen famously complained that Yann & Yoshua & Geoff did not mention the inventors of backpropagation
They heavily cite each other. Unfortunately, however, they fail to credit the pioneers of the field, which originated half a century ago.
astonishingly, the recent Turing award laudation refers to Yann’s variants of backpropagation and Geoff’s computational experiments with backpropagation, without clarifying that the method was invented by others
in the GAN thread someone wrote that “LeCun quipped that backpropagation was invented by Leibniz because it’s just the chain rule of derivation” but that’s a red herring, Linnainmaa’s reverse mode backpropagation is more specific than that, it is the efficient recursive chain rule for graphs, Leibniz did not have that
formally showed that deep NNs suffer from the now famous problem of vanishing or exploding gradients: in typical deep or recurrent networks, back-propagated error signals either shrink rapidly, or grow out of bounds. In both cases, learning fails… Note that Sepp’s thesis identified those problems of backpropagation in deep NNs two decades after another student with a similar first name (Seppo Linnainmaa) published modern backpropagation or the reverse mode of automatic differentiation in his own thesis of 1970 [BP1].