[D] Looking for opinions on using 3d models for image recognition.
I was thinking about why the training process for image recognition is such a resource intensive task, and I got caught up on the idea that information loss caused by using projections (i.e. pictures) as training data would be an interesting area to explore.
In comes mesh based 3d models. The training subject now has all of the necessary features in a single example My question is twofold. Do you see any potential in using 3d models as a source for an ML algo/NN, and if yes, how would you go about doing it?
I’ve done a bit of brain storming, but I can’t really get anywhere. My initial thoughts were decompose the problem into parts, and solve it from both ends. One challenge is getting an isolated object from an image to use as a target, and the other is decomposing a 3d model into a useful 2d projection that can be compared with the target.
Figuring that the 2D part would be easier, I decided to tackle that first, and I found the task of meaningful image segmentation to be a bit more difficult than I expected. It essentially came down to this: I can use pre existing methods to segment an image, or I can make my own; but nothing actually does decent job of pulling out objects with the exception of NNs which need to be trained to do so. In an attempt to improve the training process, I essentially run into the need to use methods that I’m trying to replace.
The 3D part has a variety of challenges as well. I figured I could do something like take an image of the model that’s in similar proportion to the image that contains my target, then try to align the centroid of the model with the centroid of the target. This would eventually get to needing to iteratively translate, rotate, and scale the model. Some other obvious issues include segmenting the 3d model, posing the model, and finding what model to use as a reference when there are multiple ones saved.