[P] How do I embed sparse data before inputting into LSTM?
I am trying to understand an LSTM similar to what’s done in a paper about embedding medical concepts in an embedding layer before inputting to an LSTM.
A screenshot of the relevant figure is below:
The paper states:
1) There are 1837 features representing medical concepts. These concepts are textual medical codes such as “CPT 9002”
2) These 1837 features are put into an embedding layer with an output of (1837 x Demb-1)
3) A frequency vector of those features is then concatenated with that output, making the total dimensions of the output (1837 x Demb)
My question is: medical data is very sparse and often times only a small fraction of the medical concepts will appear at a particular time step. For example, only 10 of the 1837 features will have data for one time step. So how do I go about creating this input for this embedding layer in practice?
Assuming I have 10 out of the 1837 features available for a timestep, would the input to the embedding look like:
1) A vector of length 10 representing the available data? If so, why would the paper say that the output is 1837xDemb-1?
2) A vector of length 1837 containing 1’s and 0’s indicating which features were available for this timestep? If so, why would you need to concatenate the frequencies to the output?
I am just super confused of how to create the input vector in practice and any information would be greatly helpful.