[D]I’m trying to implement ‘Born Again Neural Networks’ by T.Furlanello.
Hi, I’m trying to implement ‘Born Again Neural Networks(BAN)’ by T.Furlanello(https://arxiv.org/abs/1805.04770), and I have some questions. Can anybody help with this please?
If you have read some papers on Knowledge Distillation, you would know that some papers released before BAN used so called temperature. In those papers, the logits were divided with temperature(usually positive integer) and then these outputs were gone through within softmax.
In BAN paper, however, the authour didn’t mention on temperature. So I didn’t stabilize the logits (in other words, I set temperature as 1), but I found that I failed. There were no dramatic difference between Original network and Distilled Network. I guess if i just set temperature as 1 and if the networks overfits, the output distribution wouldn’t provide a meaningful ‘dark knowledge’..
for example, there would no difference between [1, 0, 0, 0,0] and [0.999, 0.000 …, 0.000 …, 0.000 …, 0.000 …]
so.. do i have to apply temperature even though the author didn’t mention on the paper? I’m wondering if I can succeed without applying temperature. Thank you for reading.