[D] Linear Networks For Classification
We glad to present and discuss our paper named Linear Distillation Learning (LDL). Is a simple remedy to improve the performance of linear networks through distillation.
In deep learning models, distillation often allows the smaller/shallow network to mimic the larger models in a much more accurate way, while a network of the same size trained on the one-hot targets can’t achieve comparable results to the cumbersome model. Our neural networks without activation functions achieved high classification score on a small amount of data on MNIST and Omniglot datasets.
The approach is based on using a linear function for each class in dataset, which is trained to simulate output of teacher linear network for each class separately. When the model is trained, we can apply classification by novelty detection for each class. Our framework distilling randomized prior functions for data, since prior functions are linear, in couple with bootstrap methods it provides a Bayes posterior.