[P] CURL: How to learn better sentence embeddings
I’ve been working on a project about SOTA methods in learning sentence embeddings. In particular I took a look at Quickthoughts, which uses a word2vec-like objective, seeking to identify “related” sentences.
As such I’ve produced an open PyTorch implementation of QuickThoughts here: https://github.com/jcaip/quickthoughts
In adddition I tried to use the theoretical framework described by Arora et al. about contrastive unsupervised representation learning, to examine the effect of changing the context_size of Quickthoughts, and offer a possible modification that should increase performance. Unfortunately this was unsuccessful, but hopefully you’ll still find the work interesting.
Please lmk if you have any questions/comments, thanks for reading!