Join our meetup, learn, connect, share, and get to know your Toronto AI community.
Browse through the latest deep learning, ai, machine learning postings from Indeed for the GTA.
Are you looking to sponsor space, be a speaker, or volunteer, feel free to give us a shout.
Transformers are currently beating the state of the art on different NLP tasks.
Some examples are:
Something I noticed is that in all of the papers, the models are massive with maybe 20 layers and 100s of millions of parameters.
Of course, using larger models is a general trend in NLP but it begs the question if small transformers are any good. I recently had to train a sequence to sequence model from scratch and I was unable to get better results with a transformer than with LSTMs.
I am wondering if someone here has had similar experiences or knows of any papers on this topic.
submitted by /u/djridu
[link] [comments]