[D] Minimum cost is not zero when calculating cross-entropy on soft labels
I am training a neural network using batches of soft labels, e.g.
y = [[0.00, 0.25, 0.25, 0.50], ... [0.75, 0.00, 0.20, 0.05]]
However, as opposed to one-hot labels, if the softmax activation function outputs a list ŷ equal to y (no loss), as in
y = ŷ = [0.00, 0.25, 0.25, 0.50]
the cross-entropy function is not 0:
loss = -sum(y * log(ŷ)) = 1.0397
although it is true that with no other ŷ we can reach a lower value, given y.
Then, the more sparse y is, the larger is the minimum possible loss:
y = ŷ = [0.25, 0.25, 0.25, 0.25] loss = -sum(y * log(ŷ)) = 1.3862
So my question is, would this lower bound in the minimum possible loss constitute a bias when training/testing a neural network? Since a neural network yields a higher minimum cost for more sparse soft labels than for less sparse (up to one-hot) labels, maybe the network adjusts the weights and biases towards a way to minimize the more sparse soft labels, in detriment of the less sparse soft and one-hot labels?