[D] How to train a network to detect a specific pose gesture(e.g. raising two hand) with limited amount of data?
I tried using a pose detection network and infer pose from output. This allows me to recognise different gestures without retrain the network. But the cost of running a full pose detection network is high, thus doesn’t work on my operating environment (embedding device).
Since I don’t actually have to frequently redefine the posture I wanted to detect, I had an idea that using a modified YOLO network to simultaneously detect person and classified whether the person is raising both her hand.
This works to some extend, however, this approach requires me to collect bunch of data of a person raising her hand and it is quite difficult to collect much data with any degree of varieties.
Is there a way to work around the data problem ? Many thanks.