Skip to main content

Blog

Learn About Our Meetup

5000+ Members

MEETUPS

LEARN, CONNECT, SHARE

Join our meetup, learn, connect, share, and get to know your Toronto AI community. 

JOB POSTINGS

INDEED POSTINGS

Browse through the latest deep learning, ai, machine learning postings from Indeed for the GTA.

CONTACT

CONNECT WITH US

Are you looking to sponsor space, be a speaker, or volunteer, feel free to give us a shout.

[D] Threshold for rejecting word embedding similarities

I have a problem where I have certain set of target words and I need to use them to match with other words that are found in new csvs. I was wondering if there are any good approaches to determining the threshold for rejecting word similarities. I was thinking using a random sample of 10k words and plot their similarities (10k*9.99k/2) but I am not sure whether this is the right approach. Or should I use the distribution of the similarities of the target words on a vocabulary and choose a percentile cutoff? Any ideas?

submitted by /u/radcapbill
[link] [comments]