[D] Ways to classify text with very low training data?
What techniques does this group feel works best when classifying text with low amounts of training data?
I ask because I recently put together a tutorial that shows how to use TensorFlow Data Pipelines and NLP classification (BERT) and it gets 85% accuracy, but it tends to work best when there are at least 200 examples of a particular class.
I am uncertain if this technique will work if I only have 1 or two examples of training data for a class. For example, it is uncertain if the same approach would be as effective if I have a piece of text that says “I was in a line today for 3 hours”, if I only have 1 or two examples of that text, and if I am trying to classify this as “Long wait times”. Building on what I was saying earlier, I think that this problem made worse when looking at engineering text or text that is specific to a corporation (where it would be difficult to generate the examples or to get Mechanical Turk workers to classify the examples correctly).
What are your thoughts on this? Have you seen better ways to classify text when there are low amounts of data?