[Discussion] How do you prepare a new dataset for your ML/DL project?
I found many guidelines online on how to prepare, analyze and clean datasets in tabular form (e.g. from csv files). Typically, they correlate the features, look for inlier/ outliers in the dataset and remove duplicates as well as corrupt samples.
But how do you perform such steps in a raw dataset consisting just of images or text as typically its the case in deep learning?
Let’s assume I just gathered 100’000 unlabeled images. Are there any tools or guidelines on how to start from there?
Thanks a lot for your input!
submitted by /u/rogi_o
[link] [comments]