Skip to main content

Blog

Learn About Our Meetup

5000+ Members

MEETUPS

LEARN, CONNECT, SHARE

Join our meetup, learn, connect, share, and get to know your Toronto AI community. 

JOB POSTINGS

INDEED POSTINGS

Browse through the latest deep learning, ai, machine learning postings from Indeed for the GTA.

CONTACT

CONNECT WITH US

Are you looking to sponsor space, be a speaker, or volunteer, feel free to give us a shout.

[D] Struggled with reading deep learning papers

Actually, as a senior graduate student, I have been doing research in the field of deep learning/nlp for several years. But there is a problem has troubled me a lot during these years. Specifically, a lot of deep learning papers (especially those trying to introduce a new model for some very specific task, for example, reading comprehension, text to SQL, e.t.c) give me a feeling that some design for the model described in the paper is highly engineered and not that intuitive, or in another word, it could have many alternative designs for some module, but few papers really justify why they adopt their specific design in depth. For instance, in a seq2seq setting, some may directly use BERT as the encoder, some may use BERT to generate the embedding for the input sequence first, and then feed the embedding to an LSTM encoder. In fact, this example cannot reveal the problem completely since there can be some other scenarios that have tons of different possible designs that might work, and different papers always adopt their very own design with no much justification.

This really makes me feel extremely bad! First, as a guy who is always eager to know WHY, those papers really can’t answer my question, or maybe it’s just not smart to ask why questions in the context of deep learning model designs. It makes doing research in this field looks more like engineering or even art design, but not science. Secondly, those various designs really impose difficulty in comparing different models. It’s really hard to do control the variables! If one model achieves better performance than the other, it’s hard to tell it is truly due to what the paper claims or some other subtle and tricky designs.

I don’t know is there any other people who feel the same way as me. How should I adjust my mindset for doing research in this field?

submitted by /u/entslscheia
[link] [comments]