Skip to main content

Blog

Learn About Our Meetup

5000+ Members

MEETUPS

LEARN, CONNECT, SHARE

Join our meetup, learn, connect, share, and get to know your Toronto AI community. 

JOB POSTINGS

INDEED POSTINGS

Browse through the latest deep learning, ai, machine learning postings from Indeed for the GTA.

CONTACT

CONNECT WITH US

Are you looking to sponsor space, be a speaker, or volunteer, feel free to give us a shout.

True accuracy calculation in positive-unlabeled setting [D]

When only the positive class is available (e.g. when we have a pattern where matches are positive but failure to match does not mean the case is negative ). Such data set is much easier to create (often domain experts can provide rules to what could qualify as positive, but cannot formulate a rule to rule-out the positive class), but the accuracy metrics calculated on the positive-unlabeled data set are inaccurate.

There are papers showing how to calculate the true accuracy (i.e. the one calculated on a data set containing both negative an positive examples) [1], but surprisingly they seem to be ignored in more applicative papers – is there a reason? Overall, such method will solve a huge problem especially in the medical domain where the rarity of the positive class requires huge sample size for validation.

The major challenge in these works is estimating the positive class prior, and they propose various algorithms for that (e.g. AlphaMax). Is there any reason not to simply manually review a sample of the positive and unlabeled sets and count the number of positive cases?

  1. Jain, S., White, M. & Radivojac, P. Recovering True Classifier Performance in Positive-Unlabeled Learning. https://www.ccs.neu.edu/home/radivojac/papers/jain_aaai_2017.pdf
  2. Jain, S., White, M., Trosset, M. W. & Radivojac, P. Nonparametric semi-supervised learning of class proportions. arXiv:1601.01944 [cs, stat] (2016).

submitted by /u/CacheMeUp
[link] [comments]