[Discussion] NLP Embeddings Applied to Classification
I’ve been experimenting with word embeddings lately from different frameworks like BERT and ELMo. I’ve tried applying these to a sequence classification problem (generated sequence embeddings by taking mean of token embeddings in the second to last hidden layer of the BERT model) and running logistic regression and random forest models using these embeddings.
However, it doesn’t seem like this works that well for small datasets (in my case, 500 data points for a 3-label classification problem). Am I correct in saying that classification using these embeddings only works well given tens of thousands of data points? All the sequence classification problems I’ve seen using these embeddings seem to support this since they have way more data (e.g. Google’s IMDB movie review sentiment example). Or are there ways you can get robust classification models with less data to work with? I was thinking of trying fine-tuning or PCA to reduce the dimensionality of the sequence embeddings and ultimately build a better classification model.