Skip to main content

Blog

Learn About Our Meetup

5000+ Members

MEETUPS

LEARN, CONNECT, SHARE

Join our meetup, learn, connect, share, and get to know your Toronto AI community. 

JOB POSTINGS

INDEED POSTINGS

Browse through the latest deep learning, ai, machine learning postings from Indeed for the GTA.

CONTACT

CONNECT WITH US

Are you looking to sponsor space, be a speaker, or volunteer, feel free to give us a shout.

[D] Minimum cost is not zero when calculating cross-entropy on soft labels

I am training a neural network using batches of soft labels, e.g.

y = [[0.00, 0.25, 0.25, 0.50], ... [0.75, 0.00, 0.20, 0.05]] 

However, as opposed to one-hot labels, if the softmax activation function outputs a list ŷ equal to y (no loss), as in

y = ŷ = [0.00, 0.25, 0.25, 0.50] 

the cross-entropy function is not 0:

loss = -sum(y * log(ŷ)) = 1.0397 

although it is true that with no other ŷ we can reach a lower value, given y.

Then, the more sparse y is, the larger is the minimum possible loss:

y = ŷ = [0.25, 0.25, 0.25, 0.25] loss = -sum(y * log(ŷ)) = 1.3862 

So my question is, would this lower bound in the minimum possible loss constitute a bias when training/testing a neural network? Since a neural network yields a higher minimum cost for more sparse soft labels than for less sparse (up to one-hot) labels, maybe the network adjusts the weights and biases towards a way to minimize the more sparse soft labels, in detriment of the less sparse soft and one-hot labels?

submitted by /u/vratiner
[link] [comments]