Learn About Our Meetup

4500+ Members

[D] Are small transformers better than small LSTMs?

Transformers are currently beating the state of the art on different NLP tasks.

Some examples are:

  • Machine translation: Transformer Big + BT
  • Named entity recognition: BERT large
  • Natural language inference: RoBERTa

Something I noticed is that in all of the papers, the models are massive with maybe 20 layers and 100s of millions of parameters.

Of course, using larger models is a general trend in NLP but it begs the question if small transformers are any good. I recently had to train a sequence to sequence model from scratch and I was unable to get better results with a transformer than with LSTMs.

I am wondering if someone here has had similar experiences or knows of any papers on this topic.

submitted by /u/djridu
[link] [comments]

Next Meetup




Plug yourself into AI and don't miss a beat


Toronto AI is a social and collaborative hub to unite AI innovators of Toronto and surrounding areas. We explore AI technologies in digital art and music, healthcare, marketing, fintech, vr, robotics and more. Toronto AI was founded by Dave MacDonald and Patrick O'Mara.