[R] Is Elmo equivalent to Fasttext+Bi-directional GRU?
From what I have read, Elmo uses bi-directional LSTM layers to give contextual embeddings for words in a sentence. So if I use a bi-directional LSTM/GRU layer over Fasttext representations of words, will it be the same? If not why? (I know that Fasttext works at the sub-word level while Elmo works at character level)
Also, does it make sense to use a bi-directional LSTM/GRU layer over the representations produced by Elmo?
The task that I am working on is extreme multi-label classification of documents.
submitted by /u/atif_hassan