[D] What do you think of this NLP academic research idea?
So my native tongue, Persian, is one of those languages in which you can create complex words with the preexisting vocabulary base. Like, imagine this example: A few years ago after reading a dissertation on color correction I downloaded Da Vinci Resolve and realized that color correction is science more than an art and I wanted to market myself as a color corrector. But there were no color correctors in Iran, so I had to come up with a name for the service I was offering myself. So I came up with the title of “rangband”. Rang=color, band=from bastan, closing, also means “setter”. So rangband=colorsetter. Turns out I was an idiot for thinking that I, someone who lacks even a spec of artistic thinking (I do make nice machine learning and signal processing concept videos on my blog though) could do color correction, so I gave up on it — Just like I’ve given up on anything that I’ve ever done in my life. However, a few days ago I was thinking that this feature in the Persian language can be harvested into a nice and dandy classification model:
1- Pair up words and make complex words.
2- Train the model with the complex words in the dictionary.
3- Let the model classify if the word has any meaning.
4- Test it with the complex words in the dictionary.
This is a binary choice, a complex word either has meaning, or not. So we can use SVMs, something which I love, to get the job done.
So thoughts? I wanna write an article and I need a professor of linguistics’ help, so tell me if I’m not wasting my time, and in turn, his time, with it. Thank you.
Tl;dr: wanna pair up preexisting words, use dictionary to make sure they are meaningful. Possible?
PS; Please tell me if this thread is best fitted for /r/learnmachinelearning. I’m not sure, I’m not that good in machine learning, but I write my own algorithms and that sub is mostly about libraries and modules and whatnot.