Skip to main content

Blog

Learn About Our Meetup

5000+ Members

MEETUPS

LEARN, CONNECT, SHARE

Join our meetup, learn, connect, share, and get to know your Toronto AI community. 

JOB POSTINGS

INDEED POSTINGS

Browse through the latest deep learning, ai, machine learning postings from Indeed for the GTA.

CONTACT

CONNECT WITH US

Are you looking to sponsor space, be a speaker, or volunteer, feel free to give us a shout.

[D] Best way to label/prepare data for full-body gesture recognition.

Sorry, I’m pretty much a newbie to this kind of things. Long story short, I have a pose data obtained from Openpose and I want to recognize certain gestures using LSTM-RNN. A gesture would be N consecutive poses obtained from camera. Some of the gestures: walking, sweeping, idling, bringing object.

Training-wise it’s pretty much not a problem acquiring 98% on training set and 90%+ on test set. Data wise also shouldn’t be a problem since I have over millions of poses already (the day I realized Google Sheet cells limit is too small LOL). But implementing it on my real-time data always shows how bad it’s. Making me thinking it’s learning the wrong kind of features.

What I’ve done:

  • With no pre-processing at all, I separated the poses into 4 regions: NE, NW, SE, SW (as in “north-east”, etc) which basically means the poses happening in 4 very diff locations on the image. Result: The LSTM isn’t even correctly recognizing the separated region consistently, let alone the certain gestures.

  • Amplifying the real distance between the 4 regions, by giving them a large offset if a certain boundary is passed. Again, same result.

  • Normalizing all of the poses to origin. This way no translation info is present on the data (inference & training). Yet, even after simplifying the data for two classes, it’s still so bad.

  • Normalizing only the first poses of a gesture to origin, and the last N-1 poses would start moving from origin (to preserve translation data). Even worse that previous iteration on test set, giving 85%.

At this point, I’m afraid I’m doing something fundamentally wrong with my data. What I worried about are:

  • I divide my gestures into regions instead of the direction it’s going.

  • The model just isn’t cut for it.

  • The movement from the noise is way too big compared to the actual gesture.

I humbly ask for any assistance at this point LOL.

submitted by /u/ArsenicBismuth
[link] [comments]