[D] To use triplet loss or not when classes labels are given. Question about theoretical/experimental expectations.
Hi, I look for some theoretical (or experimental) evidences for the superiority (or not) of the triplet loss over cross-entropy loss. Do you know some research papers which try to benchmark following setup?
- Let’s say we a fixed dataset which contains M images with annotated labels e.g. MNIST dataset.
- Then we train two models (with same architecture), one with regular categorical cross entropy and second one using triplet-loss approach (or contrastive) etc.
Since the dataset and model architectures are fixed (I assume all other hyperparameters are also fixed, maybe expect learning rate and number of epochs), we will have two models trained to minimize different objectives. I wonder if there is some common knowledge to answer following questions:
- can we expect one of the approaches to have better test accuracy ?
- can we expect one of the approaches to better generalize for new classes (e.g. not present in the training dataset) ? I mean, triplet loss was first used for face recognition, so one would expect that embeddings generated from model trained with triplet loss should be more useful for finding new classes.
- are there other expected differences ? (I’m aware that model trained with triplet loss requires different methodology for measuring performance)
TLDR: different objective function should result in different models. Can we expect the performance differences without training any model?