Join our meetup, learn, connect, share, and get to know your Toronto AI community.
Browse through the latest deep learning, ai, machine learning postings from Indeed for the GTA.
Are you looking to sponsor space, be a speaker, or volunteer, feel free to give us a shout.
At Hugging Face, we recently started distilling models starting with DistilBERT – a distilled version of BERT. We recently distilled the small version of GPT-2, which has the following parameters:
81,9M parameters vs 124M for GPT-2/small (66% parameters)
Weighs 336Mb vs 523Mb for GPT-2/small (64% disk size)
On CPU and GPU, the average forward pass of DistilGPT-2 is 51% that of GPT-2/small (twice as fast).
The absolute increase in perplexity on WikiText-103 is 3.5 points (15.0 -> 18.5).
We have added it to our app write with transformer, as well as our two repos transformers (along with a tutorial on how to distill transformers and example scripts!) and swift-coreml-transformers. We have successfully run it on an iPhone 7 and it is 38% faster than GPT-2 on an iPhone X with neural engine.
submitted by /u/jikkii
[link] [comments]