[D] BERT – mask vs [PAD] token
I’m trying out this BERT model from TFHub.
If my input tokens are [CLS] Hello world ! [SEP] [PAD] [PAD] [PAD] [PAD]
Do I need to make a mask like [1 1 1 1 1 0 0 0 0]
or should I do [1 1 1 1 1 1 1 1 1]
and let the pad token take care of it?
submitted by /u/ME_PhD
[link] [comments]