Skip to main content

Blog

Learn About Our Meetup

5000+ Members

MEETUPS

LEARN, CONNECT, SHARE

Join our meetup, learn, connect, share, and get to know your Toronto AI community. 

JOB POSTINGS

INDEED POSTINGS

Browse through the latest deep learning, ai, machine learning postings from Indeed for the GTA.

CONTACT

CONNECT WITH US

Are you looking to sponsor space, be a speaker, or volunteer, feel free to give us a shout.

Category: Reddit MachineLearning

[D] Does it get better?

To give a bit of a background, I joined big N company for a bit over an year now for an ml research position, after finishing grad school and having some moderate success in academic research.

During my time here my team has been involved in 3 projects. And results have ranged from failure to success that hardly resulted from researching ml. time put into ml optimization resulting in only marginal improvements, and most actual value coming from unrelated stuff. Currently, even though our evaluations seem to be pretty good I think I am not bringing all that much value to the company. I really hope that I’ve had a rough start and it gets better, but wanted to hear others experiences.

submitted by /u/jamesaliam
[link] [comments]

[P][D] Pytorch Sparse training library. Sparse training = fraction of all parameters updated each step. Non-used parameters saved to disk -> reduce GPU Memory Usage + Increase Training Speed. If you are working with such an architecture, let us know and we’ll optimize and include it in our release.

Hello,

We are creating a sparse training library for Pytorch. Sparse training is when only a fraction of the total parameters go through a forwards pass / backwards pass / update during each step.

Having all parameters takes up a lot of GPU memory, and in some cases may limit the total number of parameters your system can hold. By having the parameters stored on disk when not in use, that would significantly reduce the GPU memory used at any given instance, allowing you to use many more parameters.

A concern is that generally disk are not low enough latency to make this work. But we were able to figure out a pipeline to make it work. Not only that, but through a few Pytorch tricks we inadvertently discovered along the way, we think our set up may be (very slightly) faster, though we’ll need to do a bunch of test to absolutely confirm.

At the moment we need to code each adapt each architecture individually. If you or anyone you know have sparse training architecture you have in mind, point us to the paper or code and we’ll optimize and include it.

So far we’ve only been able to find recommender systems that make use of such architectures. If you know of any other architectures, please point them out.

submitted by /u/Research2Vec
[link] [comments]

[R] ma-gym: multi agent environments based on open ai gym

ma-gym is a collection of simple multi-agent environments based on open ai gym with the intention of keeping the usage simple and exposing core challenges in multi-agent settings.

I made it during my recent internship and I hope it could be useful for others in their research or getting someone started with multi-agent reinforcement learning.

Github: https://github.com/koulanurag/ma-gym

submitted by /u/HeavyStatus4
[link] [comments]

[N] Google files patent “Deep Reinforcement Learning for Robotic Manipulation”

Patent: https://patents.google.com/patent/WO2018053187A1/en

Inventor: Sergey LEVINE, Ethan HOLLY, Shixiang Gu, Timothy LILLICRAP

Abstract

Implementations utilize deep reinforcement learning to train a policy neural network that parameterizes a policy for determining a robotic action based on a current state. Some of those implementations collect experience data from multiple robots that operate simultaneously. Each robot generates instances of experience data during iterative performance of episodes that are each explorations of performing a task, and that are each guided based on the policy network and the current policy parameters for the policy network during the episode. The collected experience data is generated during the episodes and is used to train the policy network by iteratively updating policy parameters of the policy network based on a batch of collected experience data. Further, prior to performance of each of a plurality of episodes performed by the robots, the current updated policy parameters can be provided (or retrieved) for utilization in performance of the episode.

submitted by /u/Dazzling_Help
[link] [comments]

[D] Unstable performance during parameter search (Keras)

Hi all,

I was hoping we could discuss the plot below:

https://imgur.com/TMK9RgD

That plot comes from a parameter search using Keras/Tensorflow for a binary classification problem with an unbalanced class distribution (as you can tell from the acc plot, the ratio is about 5:1 negative to positive).

The metric that I am most interested in is Precision, and as you can see in this example it is very unstable, bouncing around wildly between epochs – which obviously doesn’t lend itself to being a good/stable model.

Whilst there is a little overfitting, there doesn’t seem to be too much and I can confirm that the data itself is all properly scaled and normalised.

Although the plot scale is a bit large (sorry) to tell properly, I think what we’d find is that Recall fluctuates in unison with Precision. As Recall bounces upwards, I’d expect Precision to take a dive downwards.

I can’t post the exact model because it’s a parameter search with a wide range of possible configurations, but I’m optimising across a range of network depths, widths, dropouts, shapes, learning rate, etc. I’m using binary_crossentropy as the loss, Elu activations, and Nadam optimizer – though I’ve tried a various others with similar results.

What would be your suggestions for creating a more stable model?

At the moment, the class_weight is set to 0:1, 1:1. I think upping the positive class ratio would somewhat stabilise the model (by increasing recall), but I’m shooting to have a high precision and accepting that my recall will be the trade-off and be somewhat low. For example, I’d be happy with 57% precision at 5% recall. In fact – that’s the exact result I got from a previous parameter search, but it didn’t generalise well to the blind test set, and I’m suspecting that the cause was the unstable epoch-to-epoch precision we’re seeing in this plot (though I can only see the plots for the “current” model being generated, so by the end of the many-hour parameter search all I have is a csv of the final values, with no plots to go along with them).

submitted by /u/Zman420
[link] [comments]

[R] Autonomous Navigation in Unconstrained Environments

While several datasets for autonomous navigation have become available in recent years, they have tended to focus on structured driving environments. This usually corresponds to well-delineated infrastructure such as lanes, a small number of well-defined categories for traffic participants, low variation in an object or background appearance and strong adherence to traffic rules.

I recently worked with IDD, dataset collected from India. It’s relatively more challenging than other autonomous navigation-related datasets (such as Berkeley deep drive or cityscapes) since much of the data has been captured from non standard conditions (drivable areas except roads etc.).

I’m releasing the code for this work, feel free to use it for your projects or research.

Github: https://github.com/prajjwal1/autonomous-object-detection

Dataset: https://idd.insaan.iiit.ac.in/

submitted by /u/vector_machines
[link] [comments]

[D] Why have we not seen equivalent success in deep learning based image registration?

It seems that other computer vision tasks such as classification, segmentation and synthesis have seen huge advances in accuracy thanks to CNNs, but there seems to be no equivalent advance in image registration. I tried searching for advances in image registration, but it seems that researchers still use ‘classical’ image registration techniques like mutual information, cross-correlation, etc. Even though there are DL image registration research papers, they are not well adopted in the community.

Fundamentally, is there a reason why this task is more complex that the aforementioned ones?

submitted by /u/deep-yearning
[link] [comments]

[R] ABD-Net Person Re-ID code is available

https://github.com/TAMU-VITA/ABD-Net

Attention mechanism has been shown to be effective for person re-identification (Re-ID). However, the learned attentive feature embeddings which are often not naturally diverse nor uncorrelated, will compromise the retrieval performance based on the Euclidean distance. We advocate that enforcing diversity could greatly complement the power of attention. To this end, we propose an Attentive but Diverse Network (ABD-Net), which seamlessly integrates attention modules and diversity regularization throughout the entire network, to learn features that are representative, robust, and more discriminative.

submitted by /u/yang-explore
[link] [comments]

[N] HGX-2 Deep Learning Benchmarks: The 81,920 CUDA Core “Behemoth” GPU Server

[N] HGX-2 Deep Learning Benchmarks: The 81,920 CUDA Core “Behemoth” GPU Server

Deep learning benchmarks for TensorFlow on Exxact TensorEX HGX-2 Server.

Original Post from Exxact Here

Notable GPU Server Features

  • 16x NVIDIA Tesla V100 SXM3
  • 81,920 NVIDIA CUDA Cores
  • 10,240 NVIDIA Tensor Cores
  • .5TB Total GPU Memory
  • NVSwitch powered by NVLink 2.4TB/sec aggregate speed

Source: blog.exxactcorp.com

Source: blog.exxactcorp.com

Tests were run on ResNet-50, ResNet-152, Inception V3, VGG-16. Also compared FP16 to FP32 performance, and used batch size of 256 (except for ResNet152 FP32, the batch size was 64). Same tests run using 1,2,4,8 and 16 GPU configurations. All benchmarks were done using ‘vanilla’ TensorFlow settings for FP16 and FP32.

For the full write-up + tables and numbers visit: https://blog.exxactcorp.com/hgx2-benchmarks-for-deep-learning-in-tensorflow-16x-v100-exxact-tensorex-server/

submitted by /u/exxact-jm
[link] [comments]