Skip to main content


Learn About Our Meetup

5000+ Members



Join our meetup, learn, connect, share, and get to know your Toronto AI community. 



Browse through the latest deep learning, ai, machine learning postings from Indeed for the GTA.



Are you looking to sponsor space, be a speaker, or volunteer, feel free to give us a shout.

[D] Change loss function for testing

First of all, sorry if I do not get the terminology right, I am a newbie in machine learning.

I am training a neural network providing batches of numpy arrays, each array consisting of summary statistics (0’s, 1’s, and 2’s) belonging to one of three possible pure data categories, e.g.

input_batch1 = np.array([[0, 1, 1, 2, 1, ...], [1, 2, 2, 1, 0, ...], ... [0, 0, 1, 1, 1, ...]] 

where, in this case, the 1st array belongs to the category 2, the 2nd to the category 3, and the last to the category 1. I also provide the real one-hot probabilities p for calculating the loss, e.g. for the the previous input:

p_batch1 = np.array([[0, 1, 0], [0, 0, 1], ... [1, 0, 0]]) 

I am using the softmax activation function in the output layer, and I calculate the loss with cross-entropy:

-tf.reduce_sum(p_batch1 * tf.log(softmax_output)) 

However, I want to test it with summary statistics resulting from combinations of the 3 possible categories, so it predicts the proportions from each category. For this testing, I would provide “non-one-hot” probability distributions containing the proportions of the 3 categories that result in the input sum. stats., like

p_mixed = np.array([[0.2, 0.8, 0.0], [0.7, 0.2, 0.1], ... [0.2, 0.3, 0.5]]) 

where the first array specifies that the input data was the result of the combination of 20% of the category 1 and 80% of the category 2.

I understand that the proper loss function for this kind of probabilities should be the mean squared difference:

tf.reduce_mean(tf.squared_difference(softmax_output, p_mixed)) 

So my questions are:

  1. Is it possible to use a different loss function when testing?
  2. Can I train the network feeding it sum. stats. from pure categories and providing the real one-hot probabilities, using cross-entropy as the loss function, and then test its accuracy with mixed sum. stats and providing the real proportions of each category participating in it, evaluate it using mean squared difference as loss function, and expect it to have a good accuracy?
  3. Should I expect better results if I already use the mean squared difference as the training loss function? Or it does not work well for one-hot probabilities?
  4. Should I better off train it with sum. stats. from mixed categories? I’d rather use the pure categories for training, but I could do this if it really is the best practice.

submitted by /u/vratiner
[link] [comments]