[R]Research Guide: Model Distillation Techniques for Deep Learning
Knowledge distillation is a model compression technique whereby a small network (student) is taught by a larger trained neural network (teacher). The smaller network is trained to behave like the large neural network. This enables the deployment of such models on small devices such as mobile phones or other edge devices. In this guide, we’ll look at a couple of papers that attempt to tackle this challenge.