[D] Best way to label/prepare data for full-body gesture recognition.
Sorry, I’m pretty much a newbie to this kind of things. Long story short, I have a pose data obtained from Openpose and I want to recognize certain gestures using LSTM-RNN. A gesture would be N consecutive poses obtained from camera. Some of the gestures: walking, sweeping, idling, bringing object.
Training-wise it’s pretty much not a problem acquiring 98% on training set and 90%+ on test set. Data wise also shouldn’t be a problem since I have over millions of poses already (the day I realized Google Sheet cells limit is too small LOL). But implementing it on my real-time data always shows how bad it’s. Making me thinking it’s learning the wrong kind of features.
What I’ve done:
With no pre-processing at all, I separated the poses into 4 regions: NE, NW, SE, SW (as in “north-east”, etc) which basically means the poses happening in 4 very diff locations on the image. Result: The LSTM isn’t even correctly recognizing the separated region consistently, let alone the certain gestures.
Amplifying the real distance between the 4 regions, by giving them a large offset if a certain boundary is passed. Again, same result.
Normalizing all of the poses to origin. This way no translation info is present on the data (inference & training). Yet, even after simplifying the data for two classes, it’s still so bad.
Normalizing only the first poses of a gesture to origin, and the last N-1 poses would start moving from origin (to preserve translation data). Even worse that previous iteration on test set, giving 85%.
At this point, I’m afraid I’m doing something fundamentally wrong with my data. What I worried about are:
I divide my gestures into regions instead of the direction it’s going.
The model just isn’t cut for it.
The movement from the noise is way too big compared to the actual gesture.
I humbly ask for any assistance at this point LOL.