Author: torontoai

[D] SELUs don’t actually solve the dying ReLU problem

Written on August 19, 2019. Posted in Reddit MachineLearning.

One frequently mentioned problem with ReLUs is that they can get stuck outputting nothing but 0s when their input shifts such that every value is negative. SELUs [1] claim to solve this problem.

However, there is another way that activation functions can stop being useful to the network: when they degenerate to a linear function. This can happen with ReLUs, SELUs and some other activation functions when their input shifts such that every value is positive. To demonstrate this I made a simple toy network.

The task is to approximate the ReLU function itself with the function f(x * a + b) * c + d, where x is the input, a, b, c and d are learned scalar values and f is an activation function. Values for x are uniformly chosen from the range [-0.5, 0.5].

If we start with a = 1 and b = 0.5 then all inputs to f will be positive. For many starting points of c and d this will still converge when ReLU is used for f. But for c = 1 and d = -0.5 all common piecewise activation functions will fail, including SELU and ELU.

However, there is a potential activation function that does not exhibit that problem, that I don’t see being talked about a lot: Softplus, defined as log(exp(x) + 1). Its derivative is strictly monotonically increasing and therefor non-linear in every sub range. Using softplus in place of f in the toy example allows it to converge from any starting point. [proof pending]

In the following images you can see the learned function at different numbers of iterations. The starting point a = 1, b = 0.5, c = 1 and d = -0.5 was used. All use Adam optimizer with a learning rate of 0.1 and default values for alpha and beta. The mean absolute difference is minimized. Tensorflow 1.14.0 was used.

ReLU

SELU

Softplus

In practice the inputs to activation functions may follow a long tail distribution making this very unlikely when the loss function is fixed. But for some problems, like adversarial networks, where the loss function itself is learned, this might not be the case.

There are even situations where SELU fails to converge whereas ReLU and ELU do. The following images use the starting point a = 1, b = 0.5, c = 1 and d = 0. Again, all initial inputs to the activation function are positive. As we can see this does not necessarily mean that it is stuck.

ReLU. The slight curve at 0 is a result of under-sampling the function.

SELU. Note how the initial increase in gradient below 0 creates an insurmountable wall of increased loss that gradient descent can’t overcome.

ELU. By having a monotonic gradient it does not have the same problem as SELU.

Softplus

Alternative title for this post: SELU considered harmful. (I mean no offense to the authors, their paper is truly insightful and you should definitely read it!)

[1] https://arxiv.org/abs/1706.02515

submitted by /u/relgukxilef
[link] [comments]

[P] Tensorflow live video generation

Written on August 19, 2019. Posted in Reddit MachineLearning.

I am currently working a project to generate live video clips using biggan

I started by using the default code from the deepmind colab but i am now facing a optimization problem

I need to have as many fps but doing a sess.run() for every frame is very heavy, i tried looking on maybe converting the model to a TF Lite model, but i didn’t find any information on how to do it from a tensorflow hub module

I’m pretty new to tensorflow and i didn’t find any other idea to improve my code, so i would apreciate any help

submitted by /u/Remideza
[link] [comments]

Acing Data Science Interviews

Written on August 19, 2019. Posted in Vimarsh Karbhari.

According to Indeed, there is a 344% increase in demand for data scientists year over year.

In January 2018, I started the Acing AI blog with a goal to help people get into data science. My first article was about “The State of Autonomous Transportation”. As I wrote more, I realized people were interested in acing data science interviews. This led me to start my articles covering various companies’ data science interview questions and processes. The Acing AI blog will continue to have interesting articles as always. This journey continues with today’s exciting announcements.

First, we are launching the Acing Data Science/Acing AI newsletter. This newsletter will always be free. We will be sharing interesting data science articles, interview tips and more via the newsletter. Some of you are already subscribed to this newsletter and will continue to get emails on it.

Through my first newsletter, I also wanted to share the next evolution of the Acing AI blog, Acing Data Science Interviews.

I partnered with Johnny to come up with an amazing course to help people ace data science interviews. Everything we have learned from conducting interviews, giving interviews, writing these blogs and learning from the best people in the data science, we packaged that into this course. Think about the collective six plus years of learning condensed into a three month course. That would be Acing Data Science Interviews.

At a high level, we will cover different topics from a data science interview perspective. These include SQL, coding, probability and statistics, data analysis, machine learning algorithms, advanced machine learning, machine learning system design, deep learning, neural networks, big-data concepts and finally approaching a data science interviews. The first few topics provide the foundation aspects of data science. They are followed by the data science application topics. Collectively, all these should encompass everything that could be asked in a data science interview.

The first sessions will start in the second half of September 2019. We are aiming to have a small group of 15 people. The original course will be only 199$. We are focused on quality and would like to provide the best experience and hence, we want to keep the small group size.

Acing Data Science Interviews

Thank you for reading!

Acing Data Science Interviews was originally published in Acing AI on Medium, where people are continuing the conversation by highlighting and responding to this story.

[P] Train CIFAR10 to 94% in 26 SECONDS on a single-GPU

Written on August 19, 2019. Posted in Reddit MachineLearning.

In this blog post, the author introduces a bag of standard and not-so-standard tricks to reduce training time to 34s of a Resnet model on CIFAR10 dataset, or 26s with test-time augmentation.

Blog post: https://myrtle.ai/how-to-train-your-resnet-8-bag-of-tricks/

Colab notebook: https://colab.research.google.com/github/davidcpage/cifar10-fast/blob/master/bag_of_tricks.ipynb

Author: David Page

Original tweet: https://twitter.com/dcpage3/status/1163563850442182657

submitted by /u/youali
[link] [comments]

Machine Learning Scientist – Chisel AI – Toronto, ON

Written on August 19, 2019. Posted in Toronto Job Postings.

Strong background in machine learning and deep learning with experience in Natural Language Processing. Develop machine learning algorithms for information…
From Chisel AI – Tue, 20 Aug 2019 12:09:05 GMT – View all Toronto, ON jobs

[D] Uncover new, more meaningful KPIs with Machine Learning

Written on August 19, 2019. Posted in Reddit MachineLearning.

It is well known that machine learning is already helping companies achieve their performance goals by optimizing existing performance metrics. By leveraging the growing volume of data on customer behavior, pricing, competitive action, and operational statistics, it can deliver critical insights in a variety of ways. Machine learning offers many benefits from optimizing marketing or pricing to improving customer service and operational efficiency. However, a recent article in the MIT Sloan Management Review shows that companies are increasingly using machine learning to identify entirely new KPIs to correlate with overall performance.

submitted by /u/seemingly_omniscient
[link] [comments]

[D] Symmetry-equivalent representations

Written on August 19, 2019. Posted in Reddit MachineLearning.

I’m training regression model, learning the mapping from integer-valued vectors to a single real-valued property. All cyclic permutations of my feature vectors are equivalent, that is they have the same y. I’m a bit lost in trying encode this.

One idea I had was to augment the dataset by generating all the cyclic permutations, but I don’t think this is a good way to go at all. I’ve stumbled on strategies to encode cyclic features such as months by mapping them to a periodic function, but in my case this wouldn’t work as the elements of my vector have a different meaning.

submitted by /u/throwervek
[link] [comments]

[Discussion] Is Sagemaker just a glorified EC2 instance?

Written on August 19, 2019. Posted in Reddit MachineLearning.

I’m data scientist with a lot of model and math knowledge, and experience with mostly on-prem tools and some GCP. I’m trying to pick up more cloud skills. As I’m experimenting more with Sagemaker, I can figure out how it is more than just an EC2 instance with the right libraries installed. Is there anything more to it? What am I missing?

submitted by /u/AlexSnakeKing
[link] [comments]

[R] Video Frame Interpolation via Cyclic Fine-Tuning and Asymmetric Reverse Flow

Written on August 19, 2019. Posted in Reddit MachineLearning.

Want to convert your video to slowmotion?
https://github.com/MortenHannemose/pytorch-vfi-cft

submitted by /u/mohanne
[link] [comments]

[D] Why isn’t bayesian inference using Gibbs Sampling / MCMC / HMC done on GPUs?

Written on August 19, 2019. Posted in Reddit MachineLearning.

I’ve seen Multibugs which claims to achieve impressive speedups by exploiting multicore but for the most part, i’ve not seen any of the existing Bayesian Inference leverage the GPU. Does anyone know why or why not?

Blog

Learn About Our Meetup

5000+ Members

MEETUPS

JOB POSTINGS

CONTACT

Author: torontoai

[D] SELUs don’t actually solve the dying ReLU problem

[P] Tensorflow live video generation

Acing Data Science Interviews

According to Indeed, there is a 344% increase in demand for data scientists year over year.

[P] Train CIFAR10 to 94% in 26 SECONDS on a single-GPU

Machine Learning Scientist – Chisel AI – Toronto, ON

[D] Uncover new, more meaningful KPIs with Machine Learning

[D] Symmetry-equivalent representations

[Discussion] Is Sagemaker just a glorified EC2 instance?

[R] Video Frame Interpolation via Cyclic Fine-Tuning and Asymmetric Reverse Flow

[D] Why isn’t bayesian inference using Gibbs Sampling / MCMC / HMC done on GPUs?