Skip to main content

Blog

Learn About Our Meetup

5000+ Members

MEETUPS

LEARN, CONNECT, SHARE

Join our meetup, learn, connect, share, and get to know your Toronto AI community. 

JOB POSTINGS

INDEED POSTINGS

Browse through the latest deep learning, ai, machine learning postings from Indeed for the GTA.

CONTACT

CONNECT WITH US

Are you looking to sponsor space, be a speaker, or volunteer, feel free to give us a shout.

Category: Reddit MachineLearning

[N] French BERT (CamemBERT) now available in Transformers library

The CamemBERT Transformer model (by Facebook AI, Inria and Sorbonne Université), trained on 138GB of French text was added this morning to the huggingface/transformers model repository, and is now usable in both PyTorch and TensorFlow 2! Install the library from source to play around with it!

It is available alongside chinese and german BERT models and other multi-lingual models.

CamemBERT improves the state of the art on several French NLP tasks, outperforming multi-lingual models in several tasks. It’s based on RoBERTa’s training scheme but uses whole-word masking as well as sentence-piece tokenization.

submitted by /u/jikkii
[link] [comments]

[Research] 5-minute-survey

Hello everyone, I’m working on an article on my PhD studies and I need to gather about 100 answers to my survey. It takes about 5 minutes and consist of 3 parts:

1 – 12 straightforward questions of the same type – you may feel a bit confused but it is made that way in purpose

2 – Explanation part with no questions

3 – the same 12 questions but explained earlier

I would be very grateful for doing that and thank you in advance. Here’s the link: https://www.survio.com/survey/d/T4X7T4P1F2V9S8D3P

Of course if you have any questions about the research feel free to ask

submitted by /u/AgentThompson95
[link] [comments]

[D] Many papers don’t do hyperparameter search on DNN baselines

A thing that I recognized after reading various DNN model papers is that they often don’t seem to perform hyperparameter search on their / baseline models. Many reported results seem to be for hand-picked configurations only. No search methods (like grid search, Bayesian optimization or even random search) have been used to find the best-performing configurations.

IMO this is a problem: The performance of a DNN models really depends on the choice of hyperparameters, so hypothetically you could make a baseline model perform badly by picking poor hyperparameters.

Why are so many big papers with such an incomplete evaluation out there? Or am I missing something here and it is enough to look at one configuration only?

submitted by /u/alex19111
[link] [comments]

[D] An Interesting (in my opinion) Observation While Messing With the Full GPT-2

When playing around with an online implementation of the full model (talktotransformer.com), I noticed it can do something that is (to me) really cool: It can complete analogies! If you use an open quote and start an analogy leaving out the last word, it can often get it right! This is interesting because the model was not to my knowledge trained to do this, so it must be an emergent result of “understanding” the english language! I’ve so far tried a number of different analogies in varying orders, for example ‘”Angry is to Anger as Afraid is to’ and ‘”Big is to Bigger as Small is to’, and while it doesn’t ALWAYS get it right it does more often than not.

I tried this on the earliest, incomplete model they released and it failed, so this seems to be unique to the full model (although I never tried with any of the models of intermediate complexity that they released in between with their staged release plan, so I can’t confirm at which point it gained the ability).

Anyone else noticed this? And am I alone in thinking it’s cool? It may not be as flashy as writing an article, but it shows a level of “understanding” that things like markov chains, etc., can’t generally match IME.

submitted by /u/Argenteus_CG
[link] [comments]

[D] Bahdanau attention model

Hello. I am a machine learning enthusiast. Recently I got interested in NLP and I found attention model interesting.

I was reading https://arxiv.org/pdf/1409.0473.pdf

and I couldn’t find how to compute Cz, Cr, and C.

You can find them in the paper, Appendix A where they explain how to compute update gates and reset gates.

I have searched on google but seems like people don’t mention how to compute them.

  1. How to compute Cz, Cr, and C in Bahdanau attention model?
  2. Where should I ask these questions? ( I am new to Machine learning and don’t have anyone to ask in person)
  3. Am I focus on too much detail? Should I just use libraries which has pre-built attention models? Actually I am working on a simple chat-bot project.

submitted by /u/wyzkssm
[link] [comments]

[D] What should I do?

Hi, I’m a math major at the University of Alberta, with a 3.8 gpa.

I’m not anything super special, and I think that my ability to math is pretty subpar and I’m probably not capable of doing a PhD in math. ML is something that always interested me (read a good chunk of Pattern Recognition and Machine Learning and all of Reinforcement Learning over the past year, worked as a MLE for a small stint) but I have 0 research experience.

My big pluses would be:

  • 3.8 GPA in mostly pure math isn’t too shabby (got a B- in Real Analysis II though, which looks very very bad for PhD applications in pure math)
  • A+ achieved in the introduction to machine learning course offered at my university
  • Currently on the final stretch of an internship at LinkedIn as an Infra SWE, built a cute little compiler which generates linear algebra kernels for sparse tensors
  • Going to intern at Jane Street Capital next summer (prestige wise it’s pretty much the best an undergrad could do in terms of SWE)

My big minuses would be:

  • 0 research experience
  • B- in Real Analysis II (got an A in Real Analysis I though)
  • Have not written the GRE (pretty much limits me to masters programs in Canada I think)

I’m mainly concerned with getting a PhD anywhere, I’m not too concerned with getting a PhD at somewhere prestigious. I have 3 semesters left in my degree, all of which are light (I have 4 very difficult math courses left, 1 CS course (compilers), 2 english courses and 4 arts courses which I plan to break into semesters of 4, 4, 3 courses). I have some background in compilers (read a good portion of Engineering A Compiler while I was at LinkedIn in order to do my project) which might be an interesting intersection. I’m taking a Reinforcement Learning class next semester, and I’m preparing to knock it out of the park.

What should I do within my last 3 semesters in order to maximize my quality of PhD acceptances come the end of my undergrad?

submitted by /u/OriginalMoment
[link] [comments]

[R] Is there any way to incorporate dictionary definitions as features into an NLP system?

I’m currently trying to get the dictionary definitions of words as features. However, I’m having trouble finding a source that actually has such a source.

I understand that there are many techniques that allow us to leverage the information that words carry (e.g. WordNet, Word2Vec, GloVe, ELMo, etc.) but these aren’t exactly the “dictionary definition” that I’m looking for. For example, WordNet takes advantage of the hierarchical relationship among words, and Word2Vec and GloVe tell us how similar two words are. Not exactly a “dictionary definition” in my opinion.

Does anybody know if there exists any source out there that provides such features? Thanks in advance.

submitted by /u/Seankala
[link] [comments]