Blog

Learn About Our Meetup

4500+ Members

[D] word2vec architecture

I was trying to understand the skipgram model of word2vec, and I had some problems in understanding the details. I’m clear about the high level idea – given a word, predict the context of the word. However, when you actually train the model, what is the input and output of the model for a particular training instance? To be more concrete with an example, disregarding all sophisticated techniques like negative sampling etc., if I have the sentence “it is a beautiful day today”, the input to the cbow version would be average of one-hot encoding of “it”, “is”, “a”, “day”, “today” and the output should ideally be one-hot encoding of “beautiful”. For skip-gram, I’m confused – given input one-hot encoding of “beautiful”, what should be the output be? Should be average of one-hot encoding of “it”, “is”, “a”, “day”, “today” in a single training instance or “it”, “is”, “a”, “day”, “today” in 5 separate training instances? I tried to go through the gensim codebase to understand what they do, but it’s not clear.

As an extension to this question, I also wanted to know what happens in negative sampling. The way I have understood it is that instead of forcing determinate values in the output vector to say that we want each element to match precisely to the expected one-hot encoding of the output, we say that we want to enforce 1s and 0s at only a select few places in the vectors (corresponding to positive and negative samples), which reduces the amount of back-propagation. Is this correct?

submitted by /u/alexsolanki
[link] [comments]

Next Meetup

 

Days
:
Hours
:
Minutes
:
Seconds

 

Plug yourself into AI and don't miss a beat

 


Toronto AI is a social and collaborative hub to unite AI innovators of Toronto and surrounding areas. We explore AI technologies in digital art and music, healthcare, marketing, fintech, vr, robotics and more. Toronto AI was founded by Dave MacDonald and Patrick O'Mara.