[D] Best way to label/prepare data for full-body gesture recognition.
Sorry, I’m pretty much a newbie to this kind of things. Long story short, I have a pose data obtained from Openpose and I want to recognize certain gestures using LSTM-RNN. A gesture would be N consecutive poses obtained from camera. Some of the gestures: walking, sweeping, idling, bringing object.
Training-wise it’s pretty much not a problem acquiring 98% on training set and 90%+ on test set. Data wise also shouldn’t be a problem since I have over millions of poses already (the day I realized Google Sheet cells limit is too small LOL). But implementing it on my real-time data always shows how bad it’s. Making me thinking it’s learning the wrong kind of features.
What I’ve done:
-
With no pre-processing at all, I separated the poses into 4 regions: NE, NW, SE, SW (as in “north-east”, etc) which basically means the poses happening in 4 very diff locations on the image. Result: The LSTM isn’t even correctly recognizing the separated region consistently, let alone the certain gestures.
-
Amplifying the real distance between the 4 regions, by giving them a large offset if a certain boundary is passed. Again, same result.
-
Normalizing all of the poses to origin. This way no translation info is present on the data (inference & training). Yet, even after simplifying the data for two classes, it’s still so bad.
-
Normalizing only the first poses of a gesture to origin, and the last N-1 poses would start moving from origin (to preserve translation data). Even worse that previous iteration on test set, giving 85%.
At this point, I’m afraid I’m doing something fundamentally wrong with my data. What I worried about are:
-
I divide my gestures into regions instead of the direction it’s going.
-
The model just isn’t cut for it.
-
The movement from the noise is way too big compared to the actual gesture.
I humbly ask for any assistance at this point LOL.
submitted by /u/ArsenicBismuth
[link] [comments]