[Discussion] Scaling a massive Deep Learning model. Opinions on the method described?
A week ago, at Hugging Face, we released an app which works with GPT-2 to provide a helper when writing texts. It required using GPT-2 as a backend, which is a very heavy model (the medium-sized one weighs 1.7GB).
I wrote a Medium article detailing the approach we took to scale it and to stay online for the ~10,000 users we had in the first few days. I would really like to know your opinion on the matter and if you have used other methods to take full advantage of the machines you were running your model on.
Here is the Medium post.
What do you think?