[DISCUSSION] Bert Token Embeddings
From Paper is easy to understand that BERT input is composed by Token Embeddings, Positional Encode, Sentence Encode. The last two are well-defined in BERT paper and in “Attention is all you need”. But Token embeddings is not clear how are build. Reading on Internet I found different opinions. For sure tokenization is performed using WordPiece Tokens and it’s easy understand how it splits words. But once you have the token id how BERT converts it in a Embedding?