[D] Struggled with reading deep learning papers
Actually, as a senior graduate student, I have been doing research in the field of deep learning/nlp for several years. But there is a problem has troubled me a lot during these years. Specifically, a lot of deep learning papers (especially those trying to introduce a new model for some very specific task, for example, reading comprehension, text to SQL, e.t.c) give me a feeling that some design for the model described in the paper is highly engineered and not that intuitive, or in another word, it could have many alternative designs for some module, but few papers really justify why they adopt their specific design in depth. For instance, in a seq2seq setting, some may directly use BERT as the encoder, some may use BERT to generate the embedding for the input sequence first, and then feed the embedding to an LSTM encoder. In fact, this example cannot reveal the problem completely since there can be some other scenarios that have tons of different possible designs that might work, and different papers always adopt their very own design with no much justification.
This really makes me feel extremely bad! First, as a guy who is always eager to know WHY, those papers really can’t answer my question, or maybe it’s just not smart to ask why questions in the context of deep learning model designs. It makes doing research in this field looks more like engineering or even art design, but not science. Secondly, those various designs really impose difficulty in comparing different models. It’s really hard to do control the variables! If one model achieves better performance than the other, it’s hard to tell it is truly due to what the paper claims or some other subtle and tricky designs.
I don’t know is there any other people who feel the same way as me. How should I adjust my mindset for doing research in this field?