-I really want to see doc2vec but with contextualized vectors (Bert, Elmo, etc) instead of word2vec. I think it’ll be a slam dunk. I don’t think I’ll ever get around to testing this. If anyone wants to do it, i’ll be happy to give some guidance if it’s needed.
-I would really like to see word2vec or glove tested with a context limited to other words within the same sentence as the target word. Or, perhaps extend the context to any word in the same paragraph. I was sort of planning on doing this, but lost some motivation with the rise of contextualized vectors. I think it would give some great insight though.
NeurIPS2019 paper, looks interesting:
Symmetry-Based Disentangled Representation Learning cannot only be based on static observations: agents should interact with the environment to discover its symmetries.
I’m not familiar with this line of research, but it seems like this could have significant implications on how models are trained, as many current benchmark datasets are static. I’d be interested in hearing thoughts from those more familiar with the method.
I’ve written a new blog post (https://rajatvd.github.io/Factor-Graphs/) on an awesome visualization tool that I recently came across — factor graphs. Initially, I encountered them in the context of message passing on graphical models, but soon realized that they are useful in more general contexts.
This is the first post in a series that covers the basics and mainly focuses on understanding how factor graphs work as a visualization tool, along with a cool example of a visual proof using them. In future posts, I plan to cover algorithms like message passing and belief propagation using this visualization framework.
I made the animations using manim, a math animation tool created by the amazing 3blue1brown. I built a small library, manimnx, on top of manim to help interface it with the graph package networkx. You can find the code for the animations in this github repo.
Feedback is welcome!
Interesting research from DeepMind:
“Our new work on memory uses a neural network’s weights as fast and compressive associative storage. Reading from the memory is performed by approximate minimization of the energy modeled by the network.”
“Unlike classical associative memory models such as Hopfield networks, we are not limited in the expressivity of our energy model, and make use of the deep architectures with fully-connected, convolutional and recurrent layers.”
“For this to work, stored patterns must be local minima of the energy. We use recent advances in gradient-based meta-learning to write into the memory such that this requirement approximately holds.”