[D] Spatio-temporal modeling, scalar input
I’m doing video prediction research i.e. predicting the next frame in a video sequence. In essential it’s just a mapping from the past frame into the future frame. I wonder how I can incorporate a scalar input in addition to the input frame.
Since I’m just using CNN operations and never making any flattening of the feature maps, I cannot concatenate the scalar input directly. I have found this which suggest that one could treat the bias as the scalar input of some CNN layer but doing so you are not directly adding any parameters to the scalar input.
Does anyone have any experience with this? All info, papers etc are appreciated!