[P] How to encode time information in LSTMs and perform weighted averaging where the weights are learned
I am trying to build a model based on a recent google paper called “Scalable and Accurate Deep Learning for Electronic Health Records”. The paper describes how to embed medical data for downstream prediction tasks.
I am using Pytorch and trying to modify an existing template I found online (https://github.com/yuchenlin/lstm_sentence_classifier/blob/master/LSTM_sentence_classifier.py)
In the supplementary material, they have a section describing how to embed medical data by doing the following (pic attached below too):
1) Take the raw medical data per patient per time step and embed them (like how its done in word embeddings)
2) Concatenate all history of that patient for that timestep together into a long vector
2a) For multiple information of the same type, average embeddings using a learned weighting. The weighted averaging is done by associating each feature with a non-negative weight that is trained jointly with the model.
3) Create another vector that contains time information (seconds)
Finally, in the supplementary material, it states that:
The sequence of embeddings were further reduced down to a shorter sequence. Typically, the shorter sequences were split into time-steps of 12 hours where the embeddings for all features within a category in the same day were combined using weighted averaging. The weighted averaging is done by associating each feature with a non-negative weight that is trained jointly with the model. These weights are also used for prediction attribution. The log of the average time-delta divided by a factor (controlled by a hyperparameter) at each time-step is also embedded into a small floating-point vector (which is also randomly initialized) and concatenated to the input embedding at each time-step.
My first question is: How do I actually go about inserting this time vector?
The supplementary states that the sequences were split into 12-hour time steps. If that’s the case, why the need for a time vector? It then later states that you can just simply concatenate the time vector to the input embedding. Why would a simple concatenation like that work and how would the model know to associate that time information with the appropriate part of the input?
My second question is: For Step 2a), how do I go about getting a weighted average for the same-category features? That part is glossed over without much explanation and it is lost on me.
I’m sorry if these are a lot of questions, but even pointing to some tutorials or any readings would go a long way in the right direction!