Skip to main content


Learn About Our Meetup

5000+ Members



Join our meetup, learn, connect, share, and get to know your Toronto AI community. 



Browse through the latest deep learning, ai, machine learning postings from Indeed for the GTA.



Are you looking to sponsor space, be a speaker, or volunteer, feel free to give us a shout.

[D] Jurgen Schmidhuber on Seppo Linnainmaa, inventor of backpropagation in 1970

still mining Jurgen’s dense blog post on their miraculous year 1990-1991, a rich resource for reddit threads, see exhibits A, B, C

everybody in deep learning is using backpropagation, but many don’t know who invented it, the blog has a separate web site on this which says

Its modern version (also called the reverse mode of automatic differentiation) was first published in 1970 by Finnish master student Seppo Linnainmaa

whose thesis introduced the algorithm 5 decades ago in BP1, in Finnish, English version here

In the course of many trials, Seppo Linnainmaa’s gradient-computing algorithm of 1970 [BP1], today often called backpropagation or the reverse mode of automatic differentiation is used to incrementally weaken certain NN connections and strengthen others, such that the NN behaves more and more like the teacher

Jurgen’s scholarpedia article on deep learning also cites an earlier paper by Kelley (Gradient Theory of Optimal Flight Paths, 1960) which already had the recursive chain rule for continuous systems, and papers by Bryson 1961 and Dreyfus 1962:

BP’s continuous form was derived in the early 1960s (Kelley, 1960; Bryson, 1961; Bryson and Ho, 1969). Dreyfus (1962) published the elegant derivation of BP based on the chain rule only.

however, that was not yet Seppo Linnainmaa’s

explicit, efficient error backpropagation (BP) in arbitrary, discrete, possibly sparsely connected, NN-like networks

BP’s modern efficient version for discrete sparse networks (including FORTRAN code) was published by Linnainmaa (1970). Here the complexity of computing the derivatives of the output error with respect to each weight is proportional to the number of weights. That’s the method still used today.

Jurgen’s comprehensive survey also cites Andreas Griewank, godfather of automatic differentiation who writes

Nick Trefethen [13] listed automatic differentiation as one of the 30 great numerical algorithms of the last century… Seppo Linnainmaa (Lin76) of Helsinki says the idea came to him on a sunny afternoon in a Copenhagen park in 1970…

starting on page 391, Griewank’s survey explains in detail what Linnainmaa did, it’s really illuminating

Gerardi Ostrowski came a tad too late, he published reverse mode backpropagation in 1971, in German, one year after Linnainmaa, hey, publish first or perish

the scholarpedia article also says:

Dreyfus (1973) used BP to change weights of controllers in proportion to such gradients.

later Paul Werbos was the first to apply this to neural networks, not in 1974, as some say, but in 1982:

Werbos (1982) published the first application of BP to NNs, extending thoughts in his 1974 thesis, which did not yet have Linnainmaa’s modern, efficient form of BP.

Jurgen famously complained that Yann & Yoshua & Geoff did not mention the inventors of backpropagation

They heavily cite each other. Unfortunately, however, they fail to credit the pioneers of the field, which originated half a century ago.

astonishingly, the recent Turing award laudation refers to Yann’s variants of backpropagation and Geoff’s computational experiments with backpropagation, without clarifying that the method was invented by others

in the GAN thread someone wrote that “LeCun quipped that backpropagation was invented by Leibniz because it’s just the chain rule of derivation” but that’s a red herring, Linnainmaa’s reverse mode backpropagation is more specific than that, it is the efficient recursive chain rule for graphs, Leibniz did not have that

section 3 of the blog mentions Linnainmaa again in the context of Sepp Hochreiter’s 1991 thesis VAN1 which

formally showed that deep NNs suffer from the now famous problem of vanishing or exploding gradients: in typical deep or recurrent networks, back-propagated error signals either shrink rapidly, or grow out of bounds. In both cases, learning fails… Note that Sepp’s thesis identified those problems of backpropagation in deep NNs two decades after another student with a similar first name (Seppo Linnainmaa) published modern backpropagation or the reverse mode of automatic differentiation in his own thesis of 1970 [BP1].

submitted by /u/siddarth2947
[link] [comments]