[P] For NLP Researchers, Implementation of Text Preprocessing Package, PreNLP
Do very simple text-preprocessing (a.k.a dirty work) with PreNLP Package !
I’m working in NLP part, and implementing a package to do iterative but necessary works for NLP. I want to know what you want to implement on the issue. I’ll implement it on this package.
Here are some exmaples to preprocess text.
from prenlp.data.normalization import * >>> url_normalize('Visit this link for more details: https://github.com/', repl='[URL]') Visit this link for more details: [URL] >>> tag_normalize('Use HTML with the desired attributes: <img src="cat.jpg" height="100" />', repl='[TAG]') Use HTML with the desired attributes: [TAG] >>> emoji_normalize('Hello 🤩, I love you 💓 !', repl='[EMOJI]') Hello [EMOJI], I love you [EMOJI] ! >>> email_normalize('Contact me at firstname.lastname@example.org', repl='[EMAIL]') Contact me at [EMAIL] >>> tel_normalize('Call +82 10-1234-5678', repl='[TEL]') Call [TEL]