[D] New to ML and just wanted to get some advice/clarification

Written by torontoai on May 4, 2019. Posted in Reddit MachineLearning.

I am currently attempting to learn ML specifically object detection.

Usually I am a JS developer, but wanted to extend my knowledge and incorporate the two at some point.

I hope there are not any stupid questions here.

I am looking to create a project that recognises different elements drawn on a piece of paper, for example rectangles and squares of different dimensions.

I am currently having to create the images manually by drawing them myself, I am worried this data will become too bias as I know what I want the end result to be.

I then feed that data into CV2 to add a randomly generate background to the image as the first few runs I processed with TensorFlow seemed to pick up mostly the white areas and gave me false readings.

Once I have the background image I use labelImg.py to draw borders and label the areas which contain the elements.

My data it then ready to be trained below are the results I received:-

(I allowed the training to run through 20,000 steps before stopping and testing)

Data Sets	Success Rate
25	75%
100	40%
2000	10%

As you can see from the table above when testing against the same testing images, the more datasets I introduce the lower the success rate of recognising the boxes becomes.

Question Time:-

Is there a relation between the amount of steps the trainer should run multiplied by the amount of data sets provided?

eg.

Data Sets	Steps
25	20,000
100	80,000
750	600,000

Is it possible to over train making the machine only recognise the data sets you have provided?

Is there a place I can request data sets, I do not mind doing this manually, but for bias reasons, I am wondering if it is possible?

Is there an optimised configuration file for this particular task I am trying to achieve?

I am currently using the ssd_mobilenet_v2_coco config file with some edits to image and batch size.

Is there any advice you would offer a ML noob to help progress their knowledge?

I have been watching quite a few videos on YouTube (most notably Gilbert Tanner and Sentdex) and they are getting amazing results with only a small dataset and 20,000 steps.

Also just to note, the loss rate sits under 0.8 whilst training around the 20,000 step mark.

I am hoping the question above are not too targeted and could also help other people who are starting out with ML.

submitted by /u/PrimeCodas
[link] [comments]

Blog

Learn About Our Meetup

5000+ Members

MEETUPS

JOB POSTINGS

CONTACT

[D] New to ML and just wanted to get some advice/clarification