[P] Frequency based selection vs TF-IDF score based selection
I’m working on text segmentation with novels. I plan to use 40,000 words to generate word embeddings. When filtering these words, is it okay if I merely use the most frequent words or should I use something like a TF-IDF score?
submitted by /u/aklagoo
[link] [comments]