Skip to main content


Learn About Our Meetup

5000+ Members



Join our meetup, learn, connect, share, and get to know your Toronto AI community. 



Browse through the latest deep learning, ai, machine learning postings from Indeed for the GTA.



Are you looking to sponsor space, be a speaker, or volunteer, feel free to give us a shout.

[R] What the Vec? Towards Probabilistically Grounded Embeddings

[R] What the Vec? Towards Probabilistically Grounded Embeddings

TL;DR: This is why word2vec works.



Word2Vec (W2V) and Glove are popular word embedding algorithms that perform well on a variety of natural language processing tasks. The algorithms are fast, efficient and their embeddings widely used. Moreover, the W2V algorithm has recently been adopted in the field of graph embedding, where it underpins several leading algorithms. However, despite their ubiquity and the relative simplicity of their common architecture, what the embedding parameters of W2V and Glove learn and why that it useful in downstream tasks largely remains a mystery. We show that different interactions of PMI vectors encode semantic properties that can be captured in low dimensional word embeddings by suitable projection, theoretically explaining why the embeddings of W2V and Glove work, and, in turn, revealing an interesting mathematical interconnection between the semantic relationships of relatedness, similarity, paraphrase and analogy.

Key contributions:

  • to show that semantic similarity is captured by high dimensional PMI vectors and, by considering geometric and probabilistic aspects of such vectors and their domain, to establish a hierarchical mathematical interrelationship between relatedness, similarity, paraphrases and analogies;
  • to show that these semantic properties arise through additive interactions and so are best captured in low dimensional word embeddings by linear projection, thus explaining, by comparison of their loss functions, the presence of semantic properties in the embeddings of W2V and Glove;
  • to derive a relationship between learned embedding matrices, proving that they necessarily differ (in the real domain), justifying the heuristic use of their mean, showing that different interactions are required to extract different semantic information, and enabling popular embedding comparisons, such as cosine similarity, to be semantically interpreted.

submitted by /u/ibalazevic
[link] [comments]