[P] Classify whether some text is talking about Apple or apples
I’m having a project in which I have a very big dataset related to the term “apple” (case unsensitive). It contains some text with that word and my job is to determine whether it’s talking about the Apple company, or something else.
There are so many ways to do this and I can’t seem to find the best one. Eventually, I guess it’s doable with 0 machine learning but as a lazy data scientist I want that process to be as autonomous as possible (in order to generalize to other words).
I tried some NLP techniques like bag of words then kmeans but it gave horrible results.
The problem is that there is no labeled dataset.
I have some ideas, like a proper noun / common noun classifier or using wikipedia to create a context vocabulary.
Any ideas? Thanks.