Skip to main content

Blog

Learn About Our Meetup

5000+ Members

MEETUPS

LEARN, CONNECT, SHARE

Join our meetup, learn, connect, share, and get to know your Toronto AI community. 

JOB POSTINGS

INDEED POSTINGS

Browse through the latest deep learning, ai, machine learning postings from Indeed for the GTA.

CONTACT

CONNECT WITH US

Are you looking to sponsor space, be a speaker, or volunteer, feel free to give us a shout.

Category: Reddit MachineLearning

[P] Git hook for large files: because who wants to have their 100TB data file committed to Git?

The usual disclaimer – this is not my project, but it is simple and awesome so I wanted to share.

Check out this Git pre-commit hook for large files.

What it does:
Most people working on serious ML projects have probably experienced this issue, where you accidentally do git add .
and after committing (or worse, pushing to the remote), realize that you added your ginormous model / data to the repository.

If you’re a Git expert, you can definitely fix it. But why fix something you can avoid?

It’s super easy to install (only Linux/Mac are currently supported):

curl -L https://gist.github.com/guysmoilov/ddb3329e31b001c1e990e08394a08dc4/raw/install.sh | bash

By default limits files to 5MB max size, but this can be configured with:

GIT_FILE_SIZE_LIMIT=42000000 git commit -m “This commit is allowed file sizes up to 42MB”

The hook itself is based on this Gist – which deserves credit as well.

Thanks to both developers! What other hooks do you use as part of your ML work?

submitted by /u/PhYsIcS-GUY227
[link] [comments]

[D] Starting with AI and ML

I recently followed a free MOOC from the university of Helsinki called ‘Elements of AI’. It’s an introduction level course on AI but it woke my interest on AI and I would like to learn more and maybe change my career. I am a senior Java developer and I have a background on engineering so I think the learning courve will not be very stiff.

I started learning Python with the book ‘Python crash course’ by Eric Matthes although I find it very slow for someone with knowledge in another programming language. Is there another better book? I also started ‘Machine Learning’ by Andrew Ng on coursera which I find very good. And I also planned to read Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow by Aurélien Géron as it has good reviews although I don’t know if it will be to advanced for me.

Is there a good learning platform for self learning? From my company I have free access to ‘safaribooksonline’ and ‘linkedin learning’, but I don’t mind paying if the content is worth it. What about coursera, udemy, edx, udacity, pluralsight, … ? Is any of those worth it?

submitted by /u/jaimeloplis
[link] [comments]

[D] The 1997 LSTM paper by Hochreiter & Schmidhuber has become the most cited deep learning research paper of the 20th century

  • Long short-term memory. S Hochreiter, J Schmidhuber. Neural computation, MIT Press, 1997 (26k citations as of 2019)

It has passed the backpropagation papers by Rumelhart et al. (1985, 1986, 1987). Don’t get confused by Google Scholar which sometimes incorrectly lumps together different Rumelhart publications including:

  • Learning internal representations by error propagation. DE Rumelhart, GE Hinton, RJ Williams, California Univ San Diego La Jolla, Inst for Cognitive Science, 1985 (25k)

  • Parallel distributed processing. JL McClelland, DE Rumelhart, PDP Research Group, MIT press, 1987 (24k)

  • Learning representations by back-propagating errors. DE Rumelhart, GE Hinton, RJ Williams, Nature 323 (6088), 533-536, 1986 (19k)

I think it’s good that the backpropagation paper is no longer number one, because it’s a bad role model. It does not cite the true inventors of backpropagation, and the authors have never corrected this. I learned this on reddit: Schmidhuber on Linnainmaa, inventor of backpropagation in 1970. This post also mentions Kelley (1960) and Werbos (1982).

The LSTM paper is now receiving more citations per year than all of Rumelhart’s backpropagation papers combined. And more than the most cited paper by LeCun and Bengio (1998) which is about CNNs:

  • Gradient-based learning applied to document recognition. Y LeCun, L Bottou, Y Bengio, P Haffner, IEEE 86 (11), 2278-2324, 1998 (23k)

It may soon have more citations than Bishop’s textbook on neural networks (1995).

In the 21st century, activity in the field has surged, and I found three deep learning research papers with even more citations. All of them are about applications of neural networks to ImageNet (2012, 2014, 2015). One paper describes a fast, CUDA-based, deep CNN (AlexNet) that won ImageNet 2012. Another paper describes a significantly deeper CUDA CNN that won ImageNet 2014:

  • A Krizhevsky, I Sutskever, GE Hinton. Imagenet classification with deep convolutional neural networks. NeuerIPS 2012 (53k)

  • B. K Simonyan, A Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv:1409.1556, 2014 (32k)

The paper with the most citations per year is a recent one on the much deeper ResNet which won ImageNet 2015:

  • K He, X Zhang, S Ren, J Sun. Deep Residual Learning for Image Recognition. CVPR 2016 (36k; 18k in 2019)

Remarkably, such “contest-winning deep GPU-based CNNs” can also be traced back to the Schmidhuber lab. Krizhevsky cites DanNet, the first CUDA CNN to win image recognition challenges and the first superhuman CNN (2011). I learned this on reddit: DanNet, the CUDA CNN of Dan Ciresan in Jürgen Schmidhuber’s team, won 4 image recognition challenges prior to AlexNet: ICDAR 2011 Chinese handwriting contest – IJCNN 2011 traffic sign recognition contest – ISBI 2012 image segmentation contest – ICPR 2012 medical imaging contest.

ResNet is much deeper than DanNet and AlexNet and works even better. It cites the Highway Net (Srivastava & Greff & Schmidhuber, 2015) of which it is a special case. In a sense, this closes the LSTM circle, because “Highway Nets are essentially feedforward versions of recurrent Long Short-Term Memory (LSTM) networks.”

Most LSTM citations refer to the 1997 LSTM paper. However, Schmidhuber’s post on their Annus Mirabilis points out that “essential insights” for LSTM date back to Seep Hochreiter’s 1991 diploma thesis which he considers “one of the most important documents in the history of machine learning.” (He also credits other students: “LSTM and its training procedures were further improved” “through the work of my later students Felix Gers, Alex Graves, and others.”)

The LSTM principle is essential for both recurrent networks and feedforward networks. Today it is on every smartphone. And in Deepmind’s Starcraft champion and OpenAI’s Dota champion. And in thousands of additional applications. It is the core of the deep learning revolution.

submitted by /u/lstmcnn
[link] [comments]

[D] DeepMind TFWs multi speed RNN implementation, and Their use of Zt

I am trying to re implement the FTWs multi speed RNN implementation proposed in this Paper Section 2.1 (from DeepMind). Their use of Zt, is quite confusing.

It seeams like something I need to take care of during loss and gradient calculation. But then in Figure S10, it is clearly a part of the NN Model. Or is Zt just the output from the fast RNN ?

Has someone an Idea on how to implement this?

submitted by /u/XMasterDE
[link] [comments]

[D] Some novel techniques I found that accelerates Transformer-XL to some extent

  1. Original Transformer-XL cached previous activations (input to Q, K and V) and computed K&V for the memory each iteration, but I found that you can just cache the K&V and not compute K&V again each iteration. This results in only negligible performance degradation as far as my toy experiment on Wikitext-103 went. This reduces computations at the cost of doubled GPU usage for the cache.
  2. Another trick is to apply the technique of [1], i.e., making K&V a single head with hidden dimension 64. Unbeknownst to the author, I found this works even on long-range language modeling like Wikitext-103 with negligible performance degradation. This essentially means that (1) GPU memory use of memory part becomes tiny, and (2) computation of Q&K&V is pretty much just Q.

By combining these techniques, what you get is (1) almost no GPU memory use of memory part, and (2) that you only have to compute Q for the current sequence each iteration, and (3) very fast inference even for long-range language modeling! I hope you found this post useful. (Caveat: the experiment I performed was under the assumption that the dataset is large enough.)

[1] Fast Transformer Decoding: One Write-Head is All You Need, Noam Shazeer

submitted by /u/HigherTopoi
[link] [comments]