Encoder and decoder for sequence label using transformer? [Discussion]
I use full transformer (encoder-decoder) for sequence labeling .
During the training whole transformer for NER task process:
encoder source word sequence input: “Lily goes to company”
decoder target tag sequence input:’ Person O O Location ‘
If the source vocabulary have 5272 words , If the target vacabulary have 6 tag.
should encoder input embedding size must equal to decoder input embdding size.
encoder sequence: “Lily goes to company”
target sequence: ‘ Person O O Location ‘
encoder input embedding should map shape [4,5272] to [4,512]
Should decoder input embedding must map shape [4,6] to [4,512]???