[D] How to deal with adding new data and new labels to existing models

Written by torontoai on June 26, 2019. Posted in Reddit MachineLearning.

Say you work at a company that identifies humans. You start off with a dataset with pictures of humans and bounding boxes. The company becomes a huge success and you want to develop your product further. Now you also want to identify eyes and ears, so you make a dataset for that. You have really struck gold, the market is going wild for your product, so you decide to add a new category, indicators of whether or not the human is ill.

My question is, how do you deal with this kind of growth in ML products?

For every category you add, you have to add annotations to the dataset. This can be a tremendous amount of work and might not be feasible to backfill the data you already have with the new labels. I have two suggestions for how to deal with this, you only annotate new data and then train the model in two phases. First phase you train the model on only detecting humans, next phase you add outputs for eyes and ears and fine tune on the rest of the data.

An other way you could do it is to train two separate models, one for humans and one for eyes/ears. Depending on your domain, you might want to have everything in one model if you have real time constraints, so multiple models might not be favorable.

Is there anywhere I can read more about how to deal with these kinds of issues? Do you guys have any experience in dealing with issues like this?

submitted by /u/mlaway
[link] [comments]

Blog

Learn About Our Meetup

5000+ Members

MEETUPS

JOB POSTINGS

CONTACT

[D] How to deal with adding new data and new labels to existing models