Skip to main content

Blog

Learn About Our Meetup

5000+ Members

MEETUPS

LEARN, CONNECT, SHARE

Join our meetup, learn, connect, share, and get to know your Toronto AI community. 

JOB POSTINGS

INDEED POSTINGS

Browse through the latest deep learning, ai, machine learning postings from Indeed for the GTA.

CONTACT

CONNECT WITH US

Are you looking to sponsor space, be a speaker, or volunteer, feel free to give us a shout.

Category: Reddit MachineLearning

[P] Voxceleb dataset trained on Mobilenet for speaker recognition and tuned for speaker verification

Thought maybe some people would be interested in this project I worked on last year. I used The voxceleb data to train MobileNet for speech recognition, the sound data is processed into a spectrogram and then the first and second order derivatives are calculated to get 3 dimensional data. After the training was done I used a siamese model technique to tune the features for verification instead of categorization. The idea of the project was to run the model on a smartphone (Hence why I used MobileNet) and use it for speaker verification.

I’m curious what others think about the techniques used and the results, let me know if you are interested in more details!

https://github.com/jpinedaa/Voice-ML (code is messy as hell, I might organize it later)

submitted by /u/ExtremeGeorge
[link] [comments]

[D] Why do Variational Autoencoders encode each datapoint to an individual normal distribution over z, rather than forcing all encodings Z to be normally distributed?

As in the title. Variational autoencoders encode each data sample x_i to a distribution over z, and then minimize the KL divergence between q(z_i |x_i) and p(z), where p(z) is N(0, I). In cases where the encoder does a good job of minimizing the KL loss, the reconstruction is often poor, and in cases where the reconstruction is good, the encoder may not do a good job of mapping onto p(z).

Is there some reason why we can’t just feed in all datapoints from x, which gives us a distribution over all encodings z, and then force those encodings to be normally distributed (i.e. find the mean and stdev over z, and penalize its distance from N(0,I))? This way, you don’t even need to use the reparameterization trick. If you wanted to, you could also still have each point be a distribution, you just need to take each individual variance into account as well as the means.

I’ve tested this out and it works without any issue, so is there some theoretical reason why it’s not done this way? Is it standard practice in variational methods for each datapoint i_i to have its own distribution, and if so, why?

submitted by /u/DearJudge
[link] [comments]

[D] How to deploy ML models in a web production environment?

I was reading this article – https://medium.com/faun/mastering-the-mystical-art-of-model-deployment-c0cafe011175 which details how to use Amazon SageMaker for deployment of Machine Learning Models in a web production environment.

I wanted to know if there are any other tutorials about ML Model deployment using open source/other technologies.

submitted by /u/themonkwarriorX
[link] [comments]

[R] On the information bottleneck theory of deep learning

https://iopscience.iop.org/article/10.1088/1742-5468/ab3985

Abstract

The practical successes of deep neural networks have not been matched by theoretical progress that satisfyingly explains their behavior. In this work, we study the information bottleneck (IB) theory of deep learning, which makes three specific claims: first, that deep networks undergo two distinct phases consisting of an initial fitting phase and a subsequent compression phase; second, that the compression phase is causally related to the excellent generalization performance of deep networks; and third, that the compression phase occurs due to the diffusion-like behavior of stochastic gradient descent. Here we show that none of these claims hold true in the general case, and instead reflect assumptions made to compute a finite mutual information metric in deterministic networks. When computed using simple binning, we demonstrate through a combination of analytical results and simulation that the information plane trajectory observed in prior work is predominantly a function of the neural nonlinearity employed: double-sided saturating nonlinearities like yield a compression phase as neural activations enter the saturation regime, but linear activation functions and single-sided saturating nonlinearities like the widely used ReLU in fact do not. Moreover, we find that there is no evident causal connection between compression and generalization: networks that do not compress are still capable of generalization, and vice versa. Next, we show that the compression phase, when it exists, does not arise from stochasticity in training by demonstrating that we can replicate the IB findings using full batch gradient descent rather than stochastic gradient descent. Finally, we show that when an input domain consists of a subset of task-relevant and task-irrelevant information, hidden representations do compress the task-irrelevant information, although the overall information about the input may monotonically increase with training time, and that this compression happens concurrently with the fitting process rather than during a subsequent compression period.

submitted by /u/downtownslim
[link] [comments]

[P] Cortex v0.12: Deploy models as production APIs

Repo Link: https://github.com/cortexlabs/cortex

We’ve just released a new version of Cortex, our open source platform for deploying trained models from any framework as production APIs on AWS. With this newest version, Cortex now also supports:

  • Auto Scaling. If your traffic increases, Cortex will spin up new replicas to handle things. If your traffic decreases, Cortex will reduce replicas to save on cost.
  • Spot Instances. Cortex can run on AWS Spot Instances, which can reduce instance costs by as much as 90%.
  • More Instance Types. Cortex now supports g3 and g4 instance types.
  • Batched Predictions. Cortex can now batch predictions.

submitted by /u/calebkaiser
[link] [comments]

[D] What is the best segmentation network for organ segmentation in CT scans?

I have tried Unet, att_r2_unet and a lot of variations of 2d networks. All of them are fantastic. I wanted to try 3d unet but I failed, it didn’t do it is job at all. There are so many steps I probably missed up somewhere. There are so many new models nowadays, what has been proven to be the best out there for this job? I’m tired of trying new networks/methods randomly and then finding another network.

submitted by /u/blue20whale
[link] [comments]