[D] Change loss function for testing
First of all, sorry if I do not get the terminology right, I am a newbie in machine learning.
I am training a neural network providing batches of numpy arrays, each array consisting of summary statistics (0’s, 1’s, and 2’s) belonging to one of three possible pure data categories, e.g.
input_batch1 = np.array([[0, 1, 1, 2, 1, ...], [1, 2, 2, 1, 0, ...], ... [0, 0, 1, 1, 1, ...]]
where, in this case, the 1st array belongs to the category 2, the 2nd to the category 3, and the last to the category 1. I also provide the real one-hot probabilities p for calculating the loss, e.g. for the the previous input:
p_batch1 = np.array([[0, 1, 0], [0, 0, 1], ... [1, 0, 0]])
I am using the softmax activation function in the output layer, and I calculate the loss with cross-entropy:
-tf.reduce_sum(p_batch1 * tf.log(softmax_output))
However, I want to test it with summary statistics resulting from combinations of the 3 possible categories, so it predicts the proportions from each category. For this testing, I would provide “non-one-hot” probability distributions containing the proportions of the 3 categories that result in the input sum. stats., like
p_mixed = np.array([[0.2, 0.8, 0.0], [0.7, 0.2, 0.1], ... [0.2, 0.3, 0.5]])
where the first array specifies that the input data was the result of the combination of 20% of the category 1 and 80% of the category 2.
I understand that the proper loss function for this kind of probabilities should be the mean squared difference:
So my questions are:
- Is it possible to use a different loss function when testing?
- Can I train the network feeding it sum. stats. from pure categories and providing the real one-hot probabilities, using cross-entropy as the loss function, and then test its accuracy with mixed sum. stats and providing the real proportions of each category participating in it, evaluate it using mean squared difference as loss function, and expect it to have a good accuracy?
- Should I expect better results if I already use the mean squared difference as the training loss function? Or it does not work well for one-hot probabilities?
- Should I better off train it with sum. stats. from mixed categories? I’d rather use the pure categories for training, but I could do this if it really is the best practice.