Skip to main content

Blog

Learn About Our Meetup

5000+ Members

MEETUPS

LEARN, CONNECT, SHARE

Join our meetup, learn, connect, share, and get to know your Toronto AI community. 

JOB POSTINGS

INDEED POSTINGS

Browse through the latest deep learning, ai, machine learning postings from Indeed for the GTA.

CONTACT

CONNECT WITH US

Are you looking to sponsor space, be a speaker, or volunteer, feel free to give us a shout.

[D] New to ML and just wanted to get some advice/clarification

I am currently attempting to learn ML specifically object detection.

Usually I am a JS developer, but wanted to extend my knowledge and incorporate the two at some point.

I hope there are not any stupid questions here.

I am looking to create a project that recognises different elements drawn on a piece of paper, for example rectangles and squares of different dimensions.

I am currently having to create the images manually by drawing them myself, I am worried this data will become too bias as I know what I want the end result to be.

I then feed that data into CV2 to add a randomly generate background to the image as the first few runs I processed with TensorFlow seemed to pick up mostly the white areas and gave me false readings.

Once I have the background image I use labelImg.py to draw borders and label the areas which contain the elements.

My data it then ready to be trained below are the results I received:-

(I allowed the training to run through 20,000 steps before stopping and testing)

Data Sets Success Rate
25 75%
100 40%
2000 10%

As you can see from the table above when testing against the same testing images, the more datasets I introduce the lower the success rate of recognising the boxes becomes.

Question Time:-

Is there a relation between the amount of steps the trainer should run multiplied by the amount of data sets provided?

eg.

Data Sets Steps
25 20,000
100 80,000
750 600,000

Is it possible to over train making the machine only recognise the data sets you have provided?

Is there a place I can request data sets, I do not mind doing this manually, but for bias reasons, I am wondering if it is possible?

Is there an optimised configuration file for this particular task I am trying to achieve?

I am currently using the ssd_mobilenet_v2_coco config file with some edits to image and batch size.

Is there any advice you would offer a ML noob to help progress their knowledge?

I have been watching quite a few videos on YouTube (most notably Gilbert Tanner and Sentdex) and they are getting amazing results with only a small dataset and 20,000 steps.

Also just to note, the loss rate sits under 0.8 whilst training around the 20,000 step mark.

I am hoping the question above are not too targeted and could also help other people who are starting out with ML.

submitted by /u/PrimeCodas
[link] [comments]