[D] Jurgen Schmidhuber on Alexey Ivakhnenko, godfather of deep learning 1965
In 1965, Ivakhnenko and Lapa  published the first general, working learning algorithm for supervised deep feedforward multilayer perceptrons [A0] with arbitrarily many layers of neuron-like elements, using nonlinear activation functions based on additions (i.e., linear perceptrons) and multiplications (i.e., gates). They incrementally trained and pruned their network layer by layer to learn internal representations, using regression and a separate validation set. (They did not call this a neural network, but that’s what it was.) For example, Ivakhnenko’s 1971 paper 72 already described a deep learning net with 8 layers, trained by their highly cited method (the “Group Method of Data Handling”) which was still popular in the new millennium, especially in Eastern Europe, where much of Machine Learning was born.
That is, Minsky & Papert’s later 1969 book about the limitations of shallow nets with a single layer (“Perceptrons”) addressed a “problem” that had already been solved for 4 years 🙂 Maybe Minsky did not even know, but he should have. Some claim that Minsky’s book killed NN-related research, but of course it didn’t, at least not outside the US.
Like later deep NNs, Ivakhnenko’s nets learned to create hierarchical, distributed, internal representations of incoming data.
and his blog says
In surveys from the Anglosphere it does not always become clear [DLC] that Deep Learning was invented where English is not an official language. It started in 1965 in the Ukraine (back then the USSR) with the first nets of arbitrary depth that really learned
the link in the quote is Jurgen’s famous critique of Yann & Yoshua & Geoff who failed to cite Ivakhnenko, although they should have known his work which was prominently featured in Juergen’s earlier deep learning survey, it looks as if they wanted to credit Geoff for learning internal representations, although Ivakhnenko & Lapa did this 20 years earlier, Geoff’s 2006 paper on layer-wise training in deep belief networks also did not cite Ivakhnenko’s layer-wise training, neither did Yoshua’s deep learning book, how crazy is that, a book that fails to mention the very inventors of its very topic
I also saw several recent papers on pruning deep networks, but few cite Ivakhnenko & Lapa who did this first, I bet this will change, science is self-healing
notably, Ivakhnenko did not use backpropagation but regression to adjust the weights layer by layer, both for linear units and for “gates” with polynomial activation functions
Five years later, modern backpropagation was published “next door” in Finland
we already had a reddit discussion on Seppo Linnainmaa, inventor of backpropagation in 1970
anyway, all hail Alexey Ivakhnenko and Valentin Lapa who had the first deep learning feedforward networks with many hidden layers, too bad they are dead, so no award for that