[D] Compressing Neural Network
I’m curious if this idea has been tried out anywhere, I can’t seem to find results on arxiv but I think I’m using the wrong search terms.
I’m wondering if a feasible way of reducing the number of layers of a deep network is by training another much smaller network to approximate the behaviour of the large network between the input layer and second to last layer?
As an example, say we have a network with 100 layers, then we train it until it achieves the desired accuracy, and after training, I create a smaller network with maybe 5-10 layers, with the input being my data, and the output being the values of the 99th layer from the first network. I hope this makes sense.
Anyone come across references for something like this?