[D] Term for keeping test and training data separate
So, I’ve been using the term “data hygiene” for the measures we take in (safety) ML to keep test and training data separate. Stuff like
- test data on access controlled network share
- acquiring the test set later and by different teams
- Thresholdout (when I finally get the chance to play around with it)
But apparently, I just read that term in some fringe paper once and actually data hygiene is a separate concept in data science?!
Does anyone got a good term for the methods/approaches?