[N] French BERT (CamemBERT) now available in Transformers library
The CamemBERT Transformer model (by Facebook AI, Inria and Sorbonne Université), trained on 138GB of French text was added this morning to the huggingface/transformers model repository, and is now usable in both PyTorch and TensorFlow 2! Install the library from source to play around with it!
It is available alongside chinese and german BERT models and other multi-lingual models.
CamemBERT improves the state of the art on several French NLP tasks, outperforming multi-lingual models in several tasks. It’s based on RoBERTa’s training scheme but uses whole-word masking as well as sentence-piece tokenization.