[P] How can I make my rendered training data match real data better?
![]() |
I’m trying to detect a single type of boxes from a camera image. Instead of using hand labelled images for training, I want to create the data from a 3D model using blender and a python script. So far I successfully created a dataset and trained RetinaNet on it. I do apply some augmentation (color shifts, saturation changes, noise, blurring, sharpening). The results on a validation set (consisting of synthetic data too) are great, but the localization performance on real images is way worse. What changes should I make to my rendering process to match real images better? Since it’s a virtual environment, I have pretty much unlimited control over everything, but I have no clue what makes sense to try varying. Some of the detections are flawless, but others are way off and I can’t tell what’s the visual difference that throws the network off. An example for a rendered image (training set) Excellent results on validation set (halfway hidden boxes are supposed to be not detected) submitted by /u/Single_Blueberry |