[D] What beats concatenation?
Let’s say we have two (or more) embedding spaces learned from different data spaces:
There is one one global task T that all embedding spaces are evaluated on.
To perform better on T than each embedding space would on their own it follows that we can just concatenate each vector of each embedding space. But is there a better method than to simply concatenate?