[R] DistilBERT: A smaller, faster, cheaper, lighter BERT trained with distillation!
HuggingFace released their first NLP transformer model “DistilBERT”, which is similar to the BERT architecture: only 66 million parameters (instead of 110 million) while keeping 95% of the performance on GLUE.
They released a blogpost detailing the procedure with a hands-on.
It is also available on their repository pytorch-transformers alongside 7 other transformer models.
submitted by /u/jikkii
[link] [comments]