Skip to main content

Blog

Learn About Our Meetup

5000+ Members

MEETUPS

LEARN, CONNECT, SHARE

Join our meetup, learn, connect, share, and get to know your Toronto AI community. 

JOB POSTINGS

INDEED POSTINGS

Browse through the latest deep learning, ai, machine learning postings from Indeed for the GTA.

CONTACT

CONNECT WITH US

Are you looking to sponsor space, be a speaker, or volunteer, feel free to give us a shout.

Category: Reddit MachineLearning

[Discussion] Is MINE(Mutual Information Neural Estimation) also helpful for reducing Mutual Information problem?

Hello, i got a old-fashioned but confused question about Mutual Information Neural Estimation(MINE), 2018 ICML.

In the paper, the lower bound of mutual information is estimated with neural-net-parameterized function (what is called as statistics network), and various experiments were held including information bottleneck, which reduces I(X; Z).

It’s very well-written with theoretical background, but i’m stucked with reimplement the IB results; Unfortunately the paper doesn’t provides full details about IB section; So if you have any kind of experience with employing MINE to reducing mutual information, it’d be a big pleasure if you share the experience. I made a statistics network following the paper, and optimize the statistics network while employ its estimated MI lower bound to the I(X; Z) regularizer. But it seems very volatile to initial value of exponential_moving_average(exp(t)). My error rate is hang around 1.5% which is even worse than vanila FCN.

Also, i’m not fully convinced how such MI lower-bound estimating models are greatful to reducing MI problems; Is reducing the ‘approximated’ lower bound of MI guarantee the practical reduction of MI? I think optimizing the MI estimator while also reducing such estimated MI lower bound might be not stable; as GAN, it may be kind of minmax training. On the otherhand, if we are consistent with both statistics network(increase lowerbound) and our designed loss(also increasing lowerbound), i think there is no problem. How do you think about it?

submitted by /u/pky3436
[link] [comments]

[Discussion] Is MINE (Mutual Information Neural Estimation) suitable for reducing the mutual information?

Hello, i got a old-fashioned but confused question about Belghazi et al., Mutual Information Neural Estimation, ICML 2018.

In the paper, the lower bound of mutual information is achieved by neural-net-parameterized function (what they call ‘statistics network’), and various experiments are conducted including information bottleneck which is case of ‘reducing’ I(X; Z).

Here i’m quite interested with reducing mutual information, so i started to regenerate their results, but it’s quite stucked.

Unfortunately not much details about IB implementation are included in paper, so if you have any experience employing MINE to reduce mutual information, it’d be a big pleasure, please share your way.

The paper is well-written with clear theoretical background, but i’m not sure how lowering the ‘approximated lower bound’ is helpful to reduce the actual mutual information. For those kinds of lower-bound mutual information models; Do you think those models are also practically useful to reduction of MI?

submitted by /u/pky3436
[link] [comments]

[D] Deep learning- agent for stock investment.

For last few months i am trying to create an agent which will take an input amount and a stock to invest on along with the time for investments.

It comprised of an deep learning algorithm which will predict, sentiment analysis of news related to that company and related companies, scraping of data from google trends, stock data for that company and related company for training. But i am struggling to create an agent myself which could use these data for dummy investments.

submitted by /u/tmaloo
[link] [comments]

[P] Deploy Machine Learning Models with Django

I’ve created tutorial that shows how to create web service in Python and Django to serve multiple Machine Learning models. It is different (more advanced) from most of the tutorials available on the internet:

  • it keeps information about many ML models in the web service. There can be several ML models available at the same endpoint with different versions. What is more, there can be many endpoint addresses defined.

  • it stores information about requests sent to the ML models, this can be used later for model testing and audit.

  • it has tests included for ML code and server code.

  • it can run A/B tests between different versions of ML models.

The tutorial is available at https://www.deploymachinelearning.com

The source code from the tutorial is available at https://github.com/pplonski/my_ml_service

submitted by /u/pp314159
[link] [comments]

[D] The DeepMind quizz has changed: any feedback on the new interview?

Hello,

A while ago I made a post about the DeepMind quizz where people could get relevant information. https://www.reddit.com/r/MachineLearning/comments/bf1xh2/d_any_tips_and_tricks_to_crack_the_deepmind_quiz/

Now it seems that the quizz has changed, and there is a code comprehension part.

If someone could give feedback on this new test, it would be great to help others !

Thanks!

submitted by /u/DependentSky6
[link] [comments]

[P] Predict figure skating world championship ranking from season performances (part 2: hybrid models learned by gradient descent)

I recently posted the write-up on the first part of my project (Github repo) to predict how skaters would rank in the figure skating world championship from earlier scores that they earned in the season. The main idea is to separate the skater effect, the intrinsic ability of each skater, from the event effect, the influence of an event on a skater’s performance, so that a more accurate ranking could be built.

In that previous part, I considered two simple models to find the latent skater scores that are used to rank the skaters:

  1. Score of a skater at an event = baseline score + latent skater score + latent event score

  2. Score of a skater at an event = baseline score × latent skater score × latent event score

In this part of the project (analysis, write-up), I consider a hybrid model of those two:

Score of a skater at an event = baseline score + latent skater score × latent event score

Unfortunately, this model does not have a closed-form solution to learn the parameters as opposed to the earlier models. Therefore, gradient descent was used to learn them, which resulted in this neat little animation that tracks how the model residuals, RMSE, as well as predicted ranking gets better and better as gradient descent runs. I also explore different strategies to reduce model overfit (so that it can predict skater ranking more accurately), using familiar methods such as model penalization and early stopping.

Lastly, note that this hybrid model is nothing but factorizing the event-skater score matrix into an event-specific vector and skater-specific vector, which can multiply together to approximate the score matrix. Therefore, the gradient descent to learn the values of these latent vectors is very similar to that of the famous FunkSVD algorithm to learn the user-specific and item-specific latent factors, which can multiply together to approximate the rating matrix of a recommendation system (in this case user=skater, and item=event). However, FunkSVD was used with multiple factors, and in the next part of my project, I will show how multi-factor matrix factorization can be applied to this ranking problem.

If you have any question or feedback on this, just let me know 🙂

submitted by /u/seismatica
[link] [comments]