Skip to main content

Blog

Learn About Our Meetup

5000+ Members

MEETUPS

LEARN, CONNECT, SHARE

Join our meetup, learn, connect, share, and get to know your Toronto AI community. 

JOB POSTINGS

INDEED POSTINGS

Browse through the latest deep learning, ai, machine learning postings from Indeed for the GTA.

CONTACT

CONNECT WITH US

Are you looking to sponsor space, be a speaker, or volunteer, feel free to give us a shout.

Author: torontoai

[D] Debugging model performance discrepancy between offline eval and online exp

I got the chance to have an interview with an online ads company. The interviewer asked me a question

“if we expect a newly trained model to perform well in online exp, but exp result is pretty negative, how to debug? “

My answer is “may be caused by overfitting. if so, can change the models, e.g. if using decision tree, can switch to random forest”.

The interviewer seems not very satisfied with the answer as he says switching model is heavy weight. I then answered that it could be feature or data distribution discrepancy. Then he asked how to debug these two cases. I am a little stuck.

Want to know some of your opinions?

submitted by /u/marksteve4
[link] [comments]

[Discussion] How to estimate conditional probability (cdf) of multivariate dataset?

Hi,

I am sharing the problem I face in Matlab but if you have a solution for this problem even in Python then I would very very happy.

I was able to estimate conditional probability (CDF) for a dataset that has two features (X_1 and Y) i.e., P(X_1|Y) using a Matlab function called “quantilePredict”. It works great. However, when I consider three features X_1, X_2 and Y. Then how can I find the P(X_1,X_2|Y) without the assumption of conditional independence?

How to capture the covariance as well as the CDF while considering quantiles but not mean of the data? Worst case I am fine with how to capture the covariance as well as CDF with mean of the data?

TreeBagger is trained (f) by giving “Y” as input and X_1 as output i.e., X_1 = f(Y). We then use the Treebagger model to predict responses for “quantilePredict” but in multivariate case, the Treebagger cannot fit the data where the input is “Y” and output has “X_1, X_2” i.e., Y = f(X_1,X_2) (this idea/pov is probably wrong and naive) ?.

submitted by /u/askquestion001
[link] [comments]

[P] Simple and effective phrase finding in multi-language?

Dueling with out-of-vocabulary word or phrases is been a problem on nlp, sometime using deep learning cost too much.

Maybe we can use a simple statistic way first, finding potential phrases base on word boundary.

how?

there is a drop on the boundary of phrases in a sentence, for example, one of the sentence in attention is all you need:

…multi-head attention in three different ways…

multi-head — frequency 10 multi-head attention — frequency 8 multi-head attention in — frequency 1 <- drop !! multi-head attention in three — frequency 1 

To capture this drop, it can give us some potential phrases.so I create a library to help this out.

GitHub project – Phraseg

phraseg = Phraseg(''' The goal of reducing sequential computation also forms the foundation of the Extended Neural GPU [16], ByteNet [18] and ConvS2S [9], all of which use convolutional neural networks as basic building block, computing hidden representations in parallel for all input and output positions. In these models, the number of operations required to relate signals from two arbitrary input or output positions grows in the distance between positions, linearly for ConvS2S and logarithmically for ByteNet. This makes it more difficult to learn dependencies between distant positions [12]. In the Transformer this is reduced to a constant number of operations, albeit at the cost of reduced effective resolution due to averaging attention-weighted positions, an effect we counteract with Multi-Head Attention as described in section 3.2. Self-attention, sometimes called intra-attention is an attention mechanism relating different positions of a single sequence in order to compute a representation of the sequence. Self-attention has been used successfully in a variety of tasks including reading comprehension, abstractive summarization, textual entailment and learning task-independent sentence representations [4, 27, 28, 22]. End-to-end memory networks are based on a recurrent attention mechanism instead of sequence- aligned recurrence and have been shown to perform well on simple-language question answering and language modeling tasks [34]. To the best of our knowledge, however, the Transformer is the first transduction model relying entirely on self-attention to compute representations of its input and output without using sequence- aligned RNNs or convolution. In the following sections, we will describe the Transformer, motivate self-attention and discuss its advantages over models such as [17, 18] and [9]. ''') result = phraseg.extract() 

The result will be:

[('the Transformer', 3), ('of the', 2), ('ConvS 2 S', 2), ('input and output', 2), ('output positions', 2), ('number of operations', 2), ('In the', 2), ('attention mechanism', 2), ('to compute', 2)] 

Application

we may use this to explore the daily trending of GitHub repo:

https://colab.research.google.com/drive/133uFefx7nMgeuah4FfHZjpqmqfxTyKui

Detail about how it works:

https://medium.com/@voidful.stack/simple-and-effective-phrase-finding-in-multi-language-42264554acb

GitHub project:

https://github.com/voidful/Phraseg

submitted by /u/voidful-stack
[link] [comments]