[D] lMetrics for similar classes classification
I am having trouble finding right metrics for my problem.
I am predicting tags from text (multiclass classification) and I need to come up with metric that allows me to evaluate models.
I am taking top n classes with biggest probability as model output. I can’t monitor just precision/recall/f1 because I have similar classes. For example lets say that text X have classes [ocean, fish, boat] and my predictions are [sea, shark, boat]. I would say that this is correctly classified because sea and ocean are very similar, same for shark and fish. I am currently using word2vec and taking mean cos similarity between each prediction and label which gives maximum cos with given prediction. The problem with this metric is (I suspect) that its value decrease as I increase n (top predictions).
I think that good metrics would be something that combine cos metrics with precision.