Skip to main content

Blog

Learn About Our Meetup

5000+ Members

MEETUPS

LEARN, CONNECT, SHARE

Join our meetup, learn, connect, share, and get to know your Toronto AI community. 

JOB POSTINGS

INDEED POSTINGS

Browse through the latest deep learning, ai, machine learning postings from Indeed for the GTA.

CONTACT

CONNECT WITH US

Are you looking to sponsor space, be a speaker, or volunteer, feel free to give us a shout.

Category: Reddit MachineLearning

[D] How do you deal with the pressure of such a fast moving field?

The progress of ML is getting crazy and I feel super stressed about it lately. It also didn’t help that a few ICML submissions do exactly the same thing that I wanted to do during my PhD that I started recently (I was an ML engineer previously).

There are just so many people working on similar ideas, I find it hard to keep up and contribute original ideas. How do you deal with this?

submitted by /u/vakker00
[link] [comments]

[D] Five major deep learning papers by Geoff Hinton did not cite similar earlier work by Jurgen Schmidhuber

still milking Jurgen’s very dense inaugural tweet about their annus mirabilis 1990-1991 with Sepp Hochreiter and others, 2 of its 21 sections already made for nice reddit threads, section 5 Jurgen really had GANs in 1990 and section 19 DanNet, the CUDA CNN of Dan Ciresan in Jurgen’s team, won 4 image recognition challenges prior to AlexNet, but these are not the juiciest parts of the blog post

instead look at sections 1 2 8 9 10 where Jurgen mentions work they did long before Geoff, who did not cite, as confirmed by studying the references, at first glance it’s not obvious, it’s hidden, one has to work backwards from the references

section 1, First Very Deep NNs, Based on Unsupervised Pre-Training (1991), Jurgen “facilitated supervised learning in deep RNNs by unsupervised pre-training of a hierarchical stack of RNNs” and soon was able to “solve previously unsolvable Very Deep Learning tasks of depth > 1000,” he mentions reference [UN4] which is actually Geoff’s later similar work:

More than a decade after this work [UN1], a similar method for more limited feedforward NNs (FNNs) was published, facilitating supervised learning by unsupervised pre-training of stacks of FNNs called Deep Belief Networks (DBNs) [UN4]. The 2006 justification was essentially the one I used in the early 1990s for my RNN stack: each higher level tries to reduce the description length (or negative log probability) of the data representation in the level below.

back then unsupervised pre-training was a big deal, today it’s not so important any more, see section 19, From Unsupervised Pre-Training to Pure Supervised Learning (1991-95 and 2006-11)

section 2, Compressing / Distilling one Neural Net into Another (1991), Jurgen also trained “a student NN to imitate the behavior of the teacher NN,” briefly referring to Geoff’s much later similar work [DIST2]:

I called this “collapsing” or “compressing” the behavior of one net into another. Today, this is widely used, and also called “distilling” [DIST2] or “cloning” the behavior of a teacher net into a student net.

section 9, Learning Sequential Attention with NNs (1990), Jurgen “had both of the now common types of neural sequential attention: end-to-end-differentiable “soft” attention (in latent space) through multiplicative units within NNs FAST2, and “hard” attention (in observation space) in the context of Reinforcement Learning (RL) ATT0 [ATT1],” the blog has a statement about Geoff’s later similar work ATT3 which I find both funny and sad:

My overview paper for CMSS 1990 [ATT2] summarised in Section 5 our early work on attention, to my knowledge the first implemented neural system for combining glimpses that jointly trains a recognition & prediction component with an attentional component (the fixation controller). Two decades later, the reviewer of my 1990 paper wrote about his own work as second author of a related paper [ATT3]: “To our knowledge, this is the first implemented system for combining glimpses that jointly trains a recognition component … with an attentional component (the fixation controller).”

similar in section 10, Hierarchical Reinforcement Learning (1990), Jurgen introduced HRL “with end-to-end differentiable NN-based subgoal generators HRL0, also with recurrent NNs that learn to generate sequences of subgoals [HRL1] [HRL2],” referring to Geoff’s later work HRL3:

Soon afterwards, others also started publishing on HRL. For example, the reviewer of our reference [ATT2] (which summarised in Section 6 our early work on HRL) was last author of ref [HRL3]

section 8, End-To-End-Differentiable Fast Weights: NNs Learn to Program NNs (1991), Jurgen published a network “that learns by gradient descent to quickly manipulate the fast weight storage” of another network, and “active control of fast weights through 2D tensors or outer product updates FAST2,” dryly referring to FAST4a which happens to be Geoff’s later similar paper:

A quarter century later, others followed this approach [FAST4a]

it’s really true, Geoff did not cite Jurgen in any of these similar papers, and what’s kinda crazy, he was editor of Jurgen’s 1990 paper ATT2 summarising both attention learning and hierarchical RL, then later he published closely related work, sections 9, 10, but he did not cite

Jurgen also famously complained that Geoff’s deep learning survey in Nature neither mentions the inventors of backpropagation (1960-1970) nor “the father of deep learning, Alexey Grigorevich Ivakhnenko, who published the first general, working learning algorithms for deep networks” in 1965

apart from the early pioneers in the 60s and 70s, like Ivaknenko and Fukushima, most of the big deep learning concepts stem from Jurgen’s team with Sepp and Alex and Dan and others: unsupervised pre-training of deep networks, artificial curiosity and GANs, vanishing gradients, LSTM for language processing and speech and everything, distilling networks, attention learning, CUDA CNNs that win vision contests, deep nets with 100+ layers, metalearning, plus theoretical work on optimal AGI and Godel Machine

submitted by /u/siddarth2947
[link] [comments]

[D] Can Recurrent Neural Networks have loops that go backward?

https://youtu.be/oJNHXPs0XDk?t=333

I was watching this video where the guy says that RNNs can have loopbacks (5:33).

I always thought they were called “Recurrent” because they have units that can be appended over and over to an architecture to form a sequence.

Is it really correct to say that they can have feedback loops to a previous layer and that “it’s not just a feedforward network”?

Thanks

submitted by /u/adkyary
[link] [comments]

[R] Task-Oriented Language Grounding for Language Input with Multiple Sub-Goals of Non-Linear Order

arxiv: https://arxiv.org/abs/1910.12354

github: https://github.com/vkurenkov/language-grounding-multigoal

Abstract:

In this work, we analyze the performance of general deep reinforcement learning algorithms for a task-oriented language grounding problem, where language input contains multiple sub-goals and their order of execution is non-linear.We generate a simple instructional language for the GridWorld environment, that is built around three language elements (order connectors) defining the order of execution: one linear – “comma” and two non-linear – “but first”, “but before”. We apply one of the deep reinforcement learning baselines – Double DQN with frame stacking and ablate several extensions such as Prioritized Experience Replay and Gated-Attention architecture.Our results show that the introduction of non-linear order connectors improves the success rate on instructions with a higher number of sub-goals in 2-3 times, but it still does not exceed 20%. Also, we observe that the usage of Gated-Attention provides no competitive advantage against concatenation in this setting. Source code and experiments’ results are available at this https URL

submitted by /u/Lua_b
[link] [comments]

[P] Improving Music Recommendations – looking for users to take part!

I’m looking for user data for my Computer Science Masters project “Using Community Detection to Improve Music Recommendations”.

I’ll be using machine learning to examine user music data from Spotify with the aim of improving the songs people are recommended.

I’ve produced a web app where you can consent to data being (anonymously) sampled from your Spotify account. It only takes about 1 minute to log in and would really help me out.

Thanks!

https://james-atkin-spotify-project.herokuapp.com/

submitted by /u/FeldsparKnight
[link] [comments]

[D] Self-training with Noisy Student improves ImageNet classification (STNS)

A few questions about this paper:

  1. If you train an ensemble of SOTA architectures on Imagenet and average their results, do you beat STNS?
  2. Why not fine tune the teacher? Why involve the student at all? Why not have the teacher fine tune with noisy labels and get rid of the student completely?
  3. The noisy part to the student seems odd to me. Why would this work other than the fact of adding noise you sort of anneal the solution. Why not add noise to the gradient or do what i suggest in 2.

I see Q Le has investigated noisy gradients already. https://arxiv.org/pdf/1511.06807.pdf

submitted by /u/idg101
[link] [comments]

[P] Predict figure skating world championship ranking from season performances (part 6: rank aggregation)

I’m trying to predict the ranking of figure skaters in the annual world championship by their scores in earlier competition events in the season. The obvious method to do is by average the scores for each skater across past events and rank them by those averages. However, since no two events are the same, the goal for my project is to separate the skater effect, the intrinsic ability of each skater, by the event effect, how an event influence the score of a skater.

In the previous 5 parts of my projects, I’ve developed several models to predict the ranking of skaters (as outlined in an earlier Reddit post). In this last part of my project, I try to combine these rankings into a final ranking that hopefully will be more accurate than any of the previous rankings individually. You can read the write-up for it here.

I used two different approach to combine the rankings:

  • An unsupervised approach using the centuries-old method of Borda count that is used to tally ranked votes.

  • A supervised approach using logistic regression to combine the scores from each model more intelligently, using the world championship itself as a guide.

Finally, all of the 7 ranking models that I developed in my project are benchmarked on the 5 seasons in the test set. I won’t spoil the details and explanations of the final result (you can see a glimpse of it here), but let’s just say that predicting sports is hard AF!

You can check out the Github repo of the project for all my analyses. I’m more than happy to answer any question or feedback you might have for my project. Thank you for taking the time to read it.

submitted by /u/seismatica
[link] [comments]