[N] Test a Distilled GPT-2’s generative capabilities

Written by torontoai on October 7, 2019. Posted in Reddit MachineLearning.

At Hugging Face, we recently started distilling models starting with DistilBERT – a distilled version of BERT. We recently distilled the small version of GPT-2, which has the following parameters:

81,9M parameters vs 124M for GPT-2/small (66% parameters)

Weighs 336Mb vs 523Mb for GPT-2/small (64% disk size)

On CPU and GPU, the average forward pass of DistilGPT-2 is 51% that of GPT-2/small (twice as fast).

The absolute increase in perplexity on WikiText-103 is 3.5 points (15.0 -> 18.5).

We have added it to our app write with transformer, as well as our two repos transformers (along with a tutorial on how to distill transformers and example scripts!) and swift-coreml-transformers. We have successfully run it on an iPhone 7 and it is 38% faster than GPT-2 on an iPhone X with neural engine.

submitted by /u/jikkii
[link] [comments]

Blog

Learn About Our Meetup

5000+ Members

MEETUPS

JOB POSTINGS

CONTACT

[N] Test a Distilled GPT-2’s generative capabilities