[D] Deep Learning has a size problem. We need to focus on state-of-the-art efficiency, not state-of-the-art accuracy.
I’m not sure the recent trend of larger and larger models is going to help make deep learning more useful or applicable. Mulit-billion parameter models might add a few percentage points of accuracy, but they don’t make it easier to build DL-powered applications or help other people start using the technology.
At the same time, there are some incredible results out there applying techniques like distillation, pruning, and quantization. I’d love for it to be standard practice to apply these techniques to more projects to see just how small and efficient we can make models.
For anyone interested in the topic, I wrote up a brief primer on the problem and some research into solutions. I’d love to hear of any success or failures people here have had with these techniques in production settings.