[D] Minimum cost is not zero when calculating cross-entropy on soft labels

Written by torontoai on July 22, 2019. Posted in Reddit MachineLearning.

I am training a neural network using batches of soft labels, e.g.

y = [[0.00, 0.25, 0.25, 0.50], ... [0.75, 0.00, 0.20, 0.05]]

However, as opposed to one-hot labels, if the softmax activation function outputs a list ŷ equal to y (no loss), as in

y = ŷ = [0.00, 0.25, 0.25, 0.50]

the cross-entropy function is not 0:

loss = -sum(y * log(ŷ)) = 1.0397

although it is true that with no other ŷ we can reach a lower value, given y.

Then, the more sparse y is, the larger is the minimum possible loss:

y = ŷ = [0.25, 0.25, 0.25, 0.25] loss = -sum(y * log(ŷ)) = 1.3862

So my question is, would this lower bound in the minimum possible loss constitute a bias when training/testing a neural network? Since a neural network yields a higher minimum cost for more sparse soft labels than for less sparse (up to one-hot) labels, maybe the network adjusts the weights and biases towards a way to minimize the more sparse soft labels, in detriment of the less sparse soft and one-hot labels?

submitted by /u/vratiner
[link] [comments]

Blog

Learn About Our Meetup

5000+ Members

MEETUPS

JOB POSTINGS

CONTACT

[D] Minimum cost is not zero when calculating cross-entropy on soft labels