Skip to main content

Blog

Learn About Our Meetup

5000+ Members

MEETUPS

LEARN, CONNECT, SHARE

Join our meetup, learn, connect, share, and get to know your Toronto AI community. 

JOB POSTINGS

INDEED POSTINGS

Browse through the latest deep learning, ai, machine learning postings from Indeed for the GTA.

CONTACT

CONNECT WITH US

Are you looking to sponsor space, be a speaker, or volunteer, feel free to give us a shout.

[D] Methods to perform unsupervised similarity scoring

I have a task and I don’t know how to tackle this. I received a set of positives and I have to find similar points from a big dataset (that I call basket). I have around 1’000 positives and around 1’000’000 points in the basket. All points are represented with 10 to 15 features. As an output, I would like to have a score for each point of the basket and this score would represent the closeness of the point to the positive set.

I first thought of using a k-nearest neighbours method on the positives but this approach presents two big drawbacks for me. First, I wouldn’t have a score associated to each point of the basket as I would only have a set of close points for each positive. Secondly, and this is the biggest drawback in my opinion, I would have to define the distance in the n-dimensional space myself while I would prefer that the method directly defines weights for each feature on the data (for instance, based on the level of information (variance) contained in each feature).

Does someone could point out to me a good approach to tackle this problem?

Thanks!

submitted by /u/SupervisedHelloWorld
[link] [comments]