Category: Reddit MachineLearning

[R] Data-efficient Co-Adaptation of Morphology and Behaviour with Deep Reinforcement Learning

Written on December 3, 2019. Posted in Reddit MachineLearning.

submitted by /u/hardmaru
[link] [comments]

[R] Incorporating Generic Performance Metrics in Differentiable Learning

Written on December 3, 2019. Posted in Reddit MachineLearning.

https://arxiv.org/abs/1912.00965

submitted by /u/nonatmi
[link] [comments]

[R] Probing the State of the Art: A Critical Look at Visual Representation Evaluation

Written on December 3, 2019. Posted in Reddit MachineLearning.

submitted by /u/hardmaru
[link] [comments]

[D] Jurgen Schmidhuber on Seppo Linnainmaa, inventor of backpropagation in 1970

Written on December 3, 2019. Posted in Reddit MachineLearning.

still mining Jurgen’s dense blog post on their miraculous year 1990-1991, a rich resource for reddit threads, see exhibits A, B, C

everybody in deep learning is using backpropagation, but many don’t know who invented it, the blog has a separate web site on this which says

Its modern version (also called the reverse mode of automatic differentiation) was first published in 1970 by Finnish master student Seppo Linnainmaa

whose thesis introduced the algorithm 5 decades ago in BP1, in Finnish, English version here

In the course of many trials, Seppo Linnainmaa’s gradient-computing algorithm of 1970 [BP1], today often called backpropagation or the reverse mode of automatic differentiation is used to incrementally weaken certain NN connections and strengthen others, such that the NN behaves more and more like the teacher

Jurgen’s scholarpedia article on deep learning also cites an earlier paper by Kelley (Gradient Theory of Optimal Flight Paths, 1960) which already had the recursive chain rule for continuous systems, and papers by Bryson 1961 and Dreyfus 1962:

BP’s continuous form was derived in the early 1960s (Kelley, 1960; Bryson, 1961; Bryson and Ho, 1969). Dreyfus (1962) published the elegant derivation of BP based on the chain rule only.

however, that was not yet Seppo Linnainmaa’s

explicit, efficient error backpropagation (BP) in arbitrary, discrete, possibly sparsely connected, NN-like networks

BP’s modern efficient version for discrete sparse networks (including FORTRAN code) was published by Linnainmaa (1970). Here the complexity of computing the derivatives of the output error with respect to each weight is proportional to the number of weights. That’s the method still used today.

Jurgen’s comprehensive survey also cites Andreas Griewank, godfather of automatic differentiation who writes

Nick Trefethen [13] listed automatic differentiation as one of the 30 great numerical algorithms of the last century… Seppo Linnainmaa (Lin76) of Helsinki says the idea came to him on a sunny afternoon in a Copenhagen park in 1970…

starting on page 391, Griewank’s survey explains in detail what Linnainmaa did, it’s really illuminating

Gerardi Ostrowski came a tad too late, he published reverse mode backpropagation in 1971, in German, one year after Linnainmaa, hey, publish first or perish

the scholarpedia article also says:

Dreyfus (1973) used BP to change weights of controllers in proportion to such gradients.

later Paul Werbos was the first to apply this to neural networks, not in 1974, as some say, but in 1982:

Werbos (1982) published the first application of BP to NNs, extending thoughts in his 1974 thesis, which did not yet have Linnainmaa’s modern, efficient form of BP.

Jurgen famously complained that Yann & Yoshua & Geoff did not mention the inventors of backpropagation

They heavily cite each other. Unfortunately, however, they fail to credit the pioneers of the field, which originated half a century ago.

astonishingly, the recent Turing award laudation refers to Yann’s variants of backpropagation and Geoff’s computational experiments with backpropagation, without clarifying that the method was invented by others

in the GAN thread someone wrote that “LeCun quipped that backpropagation was invented by Leibniz because it’s just the chain rule of derivation” but that’s a red herring, Linnainmaa’s reverse mode backpropagation is more specific than that, it is the efficient recursive chain rule for graphs, Leibniz did not have that

section 3 of the blog mentions Linnainmaa again in the context of Sepp Hochreiter’s 1991 thesis VAN1 which

formally showed that deep NNs suffer from the now famous problem of vanishing or exploding gradients: in typical deep or recurrent networks, back-propagated error signals either shrink rapidly, or grow out of bounds. In both cases, learning fails… Note that Sepp’s thesis identified those problems of backpropagation in deep NNs two decades after another student with a similar first name (Seppo Linnainmaa) published modern backpropagation or the reverse mode of automatic differentiation in his own thesis of 1970 [BP1].

submitted by /u/siddarth2947
[link] [comments]

[P] Learning Rate Dropout in PyTorch

Written on December 3, 2019. Posted in Reddit MachineLearning.

https://github.com/noahgolmant/pytorch-lr-dropout

I just implemented learning rate dropout using PyTorch! This technique applies dropout to the weight update at each iteration instead of the weights themselves.

I welcome any and all feedback! I ran four trials with a ResNet34 model on CIFAR-10 using both the baseline optimizer (SGD with momentum) and this variant. I wasn’t able to achieve the numbers reported in the paper, though. Feel free to double-check the masking logic or hyperparameters in case that explains the difference.

submitted by /u/noahgolm
[link] [comments]

[R] WaveFlow: A Compact Flow-based Model for Raw Audio

Written on December 3, 2019. Posted in Reddit MachineLearning.

submitted by /u/neuralspeech
[link] [comments]

[R] Leveraging Procedural Generation to Benchmark Reinforcement Learning

Written on December 3, 2019. Posted in Reddit MachineLearning.

submitted by /u/sanxiyn
[link] [comments]

[P] What could cause this behavior?

Written on December 3, 2019. Posted in Reddit MachineLearning.

Hi,

I’m making an LSTM that takes a list of same-size vectors as input. These vectors are encodings of frames in a video, and I want the LSTM to output an encoding of the entire video. To get this encoding, I am just taking the last hidden state and feeding it through a linear layer.

My issue is the hidden state seems to be converging on some fixed vector after a couple of time steps. It seems like the LSTM is forgetting previous states and entering a loop. What could cause this behavior? Is there a nice way to fix this?

Thanks

submitted by /u/jsonathan
[link] [comments]

[P] Simple hyperparameter management through dependency injection

Written on December 2, 2019. Posted in Reddit MachineLearning.

What an unruly mess some hyperparameter configurations are… In many open source deep learning codebases, the hyperparameters are treated as global variables but it’s nothing new that global variables should be avoided. Yet, here we are.

3 years ago, I started as a junior deep learning engineer at Apple and I developed a similar approach to this one: https://www.reddit.com/r/MachineLearning/comments/e5jvhq/p_how_to_get_rid_of_boilerplate_ifstatements_and/. My team had to abondon it though because…. The solution required redundant boilerplate. Using YAML files was annoying too because YAML has little support for variables and YAML has no support for lambdas or Python objects. Lastly, it wasn’t an easy process to modularize YAML files as compared to Python functions.

Anywho, the above solutions just didn’t work.

3 year later, after tinkering and working at different companies as a deep learning engineer, I came up with this approach: https://github.com/PetrochukM/HParams. Here’s what it looks like:

Example Code

The approach has a couple of benefits:

There are no global variables.
The approach lends it’s self to automatic checks that ensure:
- That no hyperparameters are overwritten.
- All declared hyperparameters are set and used.
- The hyperparameter type is correct.
It enables you to configure external functions.
The various hyperparameter dependencies are visable and intuitive.
The hyperparameters are easy to export and track.
It’s easy to add-on a CLI, like so: foo@bar:~$ u/ --torch.optim.adam.Adam.__init__ 'HParams(lr=0.1,betas=(0.999,0.99))'

Anywho, let me know what you think!

Lastly, it’s kinda cool, that similar approaches to mine were discovered and implemented by the AllenNLP library and Google’s gin-config. Does that mean I’m doing something right?

submitted by /u/Deepblue129
[link] [comments]

[R] Learning Rate Dropout

Written on December 2, 2019. Posted in Reddit MachineLearning.

submitted by /u/dmahan93
[link] [comments]

Blog

Learn About Our Meetup

5000+ Members

MEETUPS

JOB POSTINGS

CONTACT

Category: Reddit MachineLearning

[R] Data-efficient Co-Adaptation of Morphology and Behaviour with Deep Reinforcement Learning

[R] Incorporating Generic Performance Metrics in Differentiable Learning

[R] Probing the State of the Art: A Critical Look at Visual Representation Evaluation

[D] Jurgen Schmidhuber on Seppo Linnainmaa, inventor of backpropagation in 1970

[P] Learning Rate Dropout in PyTorch

[R] WaveFlow: A Compact Flow-based Model for Raw Audio

[R] Leveraging Procedural Generation to Benchmark Reinforcement Learning

[P] What could cause this behavior?

[P] Simple hyperparameter management through dependency injection

[R] Learning Rate Dropout