How to understand “chicken” is something related to “supermarket”, but not related to say “synagogue” or “pharmacy” [P]
Assume that we have an input string “I need to buy some chicken”. After working a bit on this string, suppose that we’ve reduced it to “buy chicken”
My question is, how can we understand that chicken is something related to cafe or supermarket, but not related to locksmith or post office.
More specifically, I have n number of point of interest types and I am trying to come up with n probabilities p_1, p_2, …, p_n where each probability represents the likelihood (or meaningfulness) of string-type pairs.
My ultimate goal is to have an unequality containing these n probabilities, which should of course be meaningful
I want to have
p(chicken, synagogue) < p(chicken, supermarket)
But not
p(chicken, train_station) > p(chicken, café)
I have tried to do google searches and determine these probabilities according to the number of results but it wasn’t satisfying at all.
For example, when I searched “chicken breast EMBASSY”: I got 24,500,000 results. For “chicken breast SUPERMARKET”, number of results was 11,600,000.
If we compute the probabilities by only taking these numbers into account, we’d arrive at a conclusion where p(chicken, supermarket) < p(chicken, embassy) which would of course be wrong.
Do you have any suggestions on how to approach this problem?
submitted by /u/orkalp
[link] [comments]