[D] word2vec architecture

Written by torontoai on October 8, 2019. Posted in Reddit MachineLearning.

I was trying to understand the skipgram model of word2vec, and I had some problems in understanding the details. I’m clear about the high level idea – given a word, predict the context of the word. However, when you actually train the model, what is the input and output of the model for a particular training instance? To be more concrete with an example, disregarding all sophisticated techniques like negative sampling etc., if I have the sentence “it is a beautiful day today”, the input to the cbow version would be average of one-hot encoding of “it”, “is”, “a”, “day”, “today” and the output should ideally be one-hot encoding of “beautiful”. For skip-gram, I’m confused – given input one-hot encoding of “beautiful”, what should be the output be? Should be average of one-hot encoding of “it”, “is”, “a”, “day”, “today” in a single training instance or “it”, “is”, “a”, “day”, “today” in 5 separate training instances? I tried to go through the gensim codebase to understand what they do, but it’s not clear.

As an extension to this question, I also wanted to know what happens in negative sampling. The way I have understood it is that instead of forcing determinate values in the output vector to say that we want each element to match precisely to the expected one-hot encoding of the output, we say that we want to enforce 1s and 0s at only a select few places in the vectors (corresponding to positive and negative samples), which reduces the amount of back-propagation. Is this correct?

submitted by /u/alexsolanki
[link] [comments]

Blog

Learn About Our Meetup

5000+ Members

MEETUPS

JOB POSTINGS

CONTACT

[D] word2vec architecture