[P] tog: A hackable Emacs based data-tagging framework
There are some really good tools for tagging data and creating datasets for ML like doccano. Most of these are web GUIs though which I find hard and annoying to extend.
Some time back, I made a system for tagging within Emacs which then got extended into a sort of framework. If you live inside Emacs and are willing to spend some time creating a fast tagging workflow, you can try tog which lets you create custom data taggers by writing a few rendering and parsing functions. I have been using it personally for the following:
- NER tagging
- Audio/text intent tagging
- Voicing texts or parses
- Triplet-ish song similarity tagging
It’s all Emacs Lisp, so you can extend everything. A recent example, I hit an active learning backend on each save and get next to-tag data points which are hard according to the then tagged dataset + model.