Skip to main content


Learn About Our Meetup

5000+ Members



Join our meetup, learn, connect, share, and get to know your Toronto AI community. 



Browse through the latest deep learning, ai, machine learning postings from Indeed for the GTA.



Are you looking to sponsor space, be a speaker, or volunteer, feel free to give us a shout.

[Discussion]How to ensure the quality of labeling data?

Hi all the MLers,

Usually, we could have labeled data with specific degree of QUALITY from Kaggle or from some workshops. However, if we are in real world usage, how could we ensure the quality of the labeled data(in addition to accuracy, a more crucial point is the reliability on it to train models)?

To be more specific, I am doing a project on Named Entity Recognition(to extract and label some product names from some formal documents describing some products and their relation). We now have a rule-based model to extract the NE by Regular Expression, and we now would like to do the task in a Machine Learning workflow.

So, here’s the question, if we directly take the data labeled by our rule-based model as the training data for our ML model, how could we first ensure the quality of this training data before we feed it into the model.

This is my first time to have to generate training data from scratch, so I really appreciate any discussion with you guys. Any ideas and comments are welcome! Thanks ALOT!

submitted by /u/ClassifyOrRegreddit
[link] [comments]