[D] New to ML and just wanted to get some advice/clarification
I am currently attempting to learn ML specifically object detection.
Usually I am a JS developer, but wanted to extend my knowledge and incorporate the two at some point.
I hope there are not any stupid questions here.
I am looking to create a project that recognises different elements drawn on a piece of paper, for example rectangles and squares of different dimensions.
I am currently having to create the images manually by drawing them myself, I am worried this data will become too bias as I know what I want the end result to be.
I then feed that data into CV2 to add a randomly generate background to the image as the first few runs I processed with TensorFlow seemed to pick up mostly the white areas and gave me false readings.
Once I have the background image I use labelImg.py to draw borders and label the areas which contain the elements.
My data it then ready to be trained below are the results I received:-
(I allowed the training to run through 20,000 steps before stopping and testing)
Data Sets | Success Rate |
---|---|
25 | 75% |
100 | 40% |
2000 | 10% |
As you can see from the table above when testing against the same testing images, the more datasets I introduce the lower the success rate of recognising the boxes becomes.
Question Time:-
Is there a relation between the amount of steps the trainer should run multiplied by the amount of data sets provided?
eg.
Data Sets | Steps |
---|---|
25 | 20,000 |
100 | 80,000 |
750 | 600,000 |
Is it possible to over train making the machine only recognise the data sets you have provided?
Is there a place I can request data sets, I do not mind doing this manually, but for bias reasons, I am wondering if it is possible?
Is there an optimised configuration file for this particular task I am trying to achieve?
I am currently using the ssd_mobilenet_v2_coco config file with some edits to image and batch size.
Is there any advice you would offer a ML noob to help progress their knowledge?
I have been watching quite a few videos on YouTube (most notably Gilbert Tanner and Sentdex) and they are getting amazing results with only a small dataset and 20,000 steps.
Also just to note, the loss rate sits under 0.8 whilst training around the 20,000 step mark.
I am hoping the question above are not too targeted and could also help other people who are starting out with ML.
submitted by /u/PrimeCodas
[link] [comments]