Skip to main content

Blog

Learn About Our Meetup

5000+ Members

MEETUPS

LEARN, CONNECT, SHARE

Join our meetup, learn, connect, share, and get to know your Toronto AI community. 

JOB POSTINGS

INDEED POSTINGS

Browse through the latest deep learning, ai, machine learning postings from Indeed for the GTA.

CONTACT

CONNECT WITH US

Are you looking to sponsor space, be a speaker, or volunteer, feel free to give us a shout.

Category: Reddit MachineLearning

[P] Sentence similarity using siamese LSTM

So I have a project where I find the semantic sentence similarity between a dataset of two sentences. For the dataset, I use STS-Benchmark.

First, I used English Wikipedia dump to create a word2vec matrix. Then I used the function text_to_sequence to convert my sentences to an array.

I developped a siamese LSTM, but my problem is that the validation accuracy never increase. I’m stuck at an accuracy of 0.25 to 0.30. When I use spearman’s correlation, I get a value of 45%.

Here is my code: https://pastebin.com/jPaZbDDM

submitted by /u/momo11arsenal
[link] [comments]

[D] So where are we at on the Hype Cycle right now? Before or after the trough of disillusionment?

You never hear about ML or “AI” in the mainstream any more, not since about 2017. AlphaGo and self-driving was exciting, then the self-driving started to struggle and people twigged that the self-play approach cannot be translated to applications where you don’t have a ‘game’ i.e. a perfect model of the environment’s response to actions. The only things that make the media now are garbage from OpenAI like “We PuBliSheD a BoOk written by our garbled-text generator”. So I just wonder what people think about whether we are on the way down to a crash, or just past it and quietly plateauing in productivity, this time around the “AI” hype cycle?

submitted by /u/carrolldunham
[link] [comments]

[N] Microsoft Incorporates Graphcore AI Chips in Azure Cloud

Graphcore’s AI accelerator chip, the Colossus intelligence processing unit (IPU) is now available for customers to use as part of Microsoft’s Azure cloud platform.

This is the first time any major cloud service provider has publicly offered customers the opportunity to run their data on an accelerator from any of the dozens of AI chip startups and as such, it represents a big win for Graphcore. Microsoft has said access will initially be prioritised for customers who are “pushing the boundaries of machine learning”.

Microsoft and Graphcore have been working together for two years to develop cloud systems and build enhanced vision and natural language processing models for the Graphcore IPU. In particular, the natural language processing (NLP) model, Google’s BERT (bidirectional encoder representations from transformers), which is currently very popular with search engines, including Google themselves.

Using eight Graphcore IPU processor cards (each with a pair of Colossus accelerators), BERT can be trained in 56 hours, similar to the result for GPU with PyTorch, though it is faster than the GPU with TensorFlow (see graph below). Graphcore says customers are seeing BERT inference throughput increase threefold, with 20% improvement in latency.

Given the level of hype surrounding Graphcore — the company is valued at $1.7 billion — these performance improvements seem rather modest. It remains to be seen whether the promised improvement is enough to tempt customers into optimising their models for the IPU.

Advanced models
At the same time, Graphcore has also released some results on more advanced models, where it showed more dramatic performance improvements.

Inference on image processing model ResNext was accelerated 3.4x in terms of throughput at 18x lower latency, compared to a GPU solution consuming the same amount of power. ResNext uses a technique called group separable convolutions, which splits convolution filters into smaller separable blocks to increase accuracy while reducing the parameter count. This approach is well-suited to the IPU, Graphcore says, because of the chip’s massively parallel processor architecture and more flexible, high-throughput memory; smaller blocks of data can be mapped to thousands of fully independent processing threads.

Graphcore also showed good results for Markov Chain Monte Carlo (MCMC)-based models, a new type of probabilistic algorithm which is used for modelling financial markets. This type of model has been out of reach for many in the finance industry, as it was previously considered too computationally expensive to use, said Graphcore. Early access IPU customers in the finance sector have been able to train their proprietary, optimised MCMC models in 4.5 minutes on IPUs, compared to over 2 hours with their existing hardware, a 26x speed up in training time.

Reinforcement learning (RL), another popular technique in modern AI algorithm development, can also be accelerated compared to typical existing solutions. Graphcore cited a factor of ten improvement in throughput for RL models, even before they are optimised for the IPU.

https://www.eetimes.com/document.asp?doc_id=1335297#

submitted by /u/downtownslim
[link] [comments]

[D] Machine Learning – WAYR (What Are You Reading) – Week 75

This is a place to share machine learning research papers, journals, and articles that you’re reading this week. If it relates to what you’re researching, by all means elaborate and give us your insight, otherwise it could just be an interesting paper you’ve read.

Please try to provide some insight from your understanding and please don’t post things which are present in wiki.

Preferably you should link the arxiv page (not the PDF, you can easily access the PDF from the summary page but not the other way around) or any other pertinent links.

Previous weeks :

1-10 11-20 21-30 31-40 41-50 51-60 61-70 71-80
Week 1 Week 11 Week 21 Week 31 Week 41 Week 51 Week 61 Week 71
Week 2 Week 12 Week 22 Week 32 Week 42 Week 52 Week 62 Week 72
Week 3 Week 13 Week 23 Week 33 Week 43 Week 53 Week 63 Week 73
Week 4 Week 14 Week 24 Week 34 Week 44 Week 54 Week 64 Week 74
Week 5 Week 15 Week 25 Week 35 Week 45 Week 55 Week 65
Week 6 Week 16 Week 26 Week 36 Week 46 Week 56 Week 66
Week 7 Week 17 Week 27 Week 37 Week 47 Week 57 Week 67
Week 8 Week 18 Week 28 Week 38 Week 48 Week 58 Week 68
Week 9 Week 19 Week 29 Week 39 Week 49 Week 59 Week 69
Week 10 Week 20 Week 30 Week 40 Week 50 Week 60 Week 70

Most upvoted papers two weeks ago:

/u/adventuringraw: original TrueSkill paper from Microsoft

/u/Grimm___: http://proceedings.mlr.press/v67/gutierrez17a/gutierrez17a.pdf

Besides that, there are no rules, have fun.

submitted by /u/ML_WAYR_bot
[link] [comments]

[D] Transfer Learning for Survival Models

Survival models are similar to linear regression models and in this case I am using a AFT survival model. I have trained the model on one dataset and I intend to use this model to predict time to failure for another dataset. I would like to discuss on the criteria that is needed for the transfer to happen as in how the model transfer can be done and if there are approaches I can consider for this purpose. Thanks.

submitted by /u/stat_leaf
[link] [comments]

[R] Neural Network Processing Neural Networks

I would like to share some research I have been working on my spare time:

https://arxiv.org/abs/1911.05640

It is about another type of neural networks which take neural networks as inputs and/or produce them as outputs which seem to be doing well especially on search problems according to my own experiments. I would be really grateful if anyone could provide some feedback.

submitted by /u/firat_tuna
[link] [comments]

[D] Statistical/ML analysis of intention + wordnets, phrasenets

I’m having a mental struggle right now trying to understand how I would go about programming this, and I’m not even sure it’s feasible.

The problem

Let’s say we’re analyzing song lyrics. Let’s say that hypothetically, whenever the word “darkness” is mentioned in a lyric, there is a 23% chance that the word “night” is also mentioned and a 14% chance that the word “doubt” is also in the lyric.

A second and more complex relationship would be that of phrases. We could imagine that whenever the word “darkness” is mentioned, there is a 3.2% chance that the phrase “I’m scared” is somewhere in the lyric and 0.9% chance that the phrase “going to die” is also there.

A third addition to the complexity would be to add sentiment analysis with a machine learning version of a wordnet that analyzes not only the related words but the related moods.

A fourth addition to the complexity would see morphosyntactical analysis. “I’m scared” is not a feasible assumption as there are many possible subjects in a “scared” sentence, but it would be more feasible for it to be frequent if we said “noun + [to be, present tense] + scared”. This would cover “I’m scared”, “he’s scared”, “we’re scared”, “my son is scared”, etc. And then we could add adverbs and sentence changes (‘our family is, therefore, exceptionally scared’).

The bad way

My current thoughts about it come from traditional programming where for that analysis to occur, we would grab a reference word, grab the rest of the corpus words and count each of ocurrence of each corpus word, then throw all of those counts into an array belonging to the reference word we were analyzing for, and then do that for every word in a text. That would be insanely expensive and would get nowhere.

The ideal but unknown way

A cheaper way to do this would be with an AI + a vectorial or matrix datatype. I’ve been exploring the kinds of AI’s that there are but I’m very new to this and don’t know which one is more appropriate and which analysis algorithm would be best. I’m not even sure if it can be done with our current technology in this exact way, or whether there would be differences in the results I described. Perhaps AI would not be as accurate statistically but would instead rate analytically with a 0-100 not the statistical tendency but the “feel” it gets for how “similar” one word is to another due to their common context. How accurate would this be statistically?


I’ve been pumped recently with BERT, but I’m not experienced enough to create my own conclusions on the topic.

  • How feasible do you think this would be?
  • What are your thoughts about the necessary implications and existing ways to approach them?
  • What similar projects are there being developed right now that you know?
  • How would someone interested in this go into learning more about this specifically without much experience in machine learning in general?

submitted by /u/Live_Think_Diagnosis
[link] [comments]