[P] Playing RTS games with audio recognition instead of using hands for input
So about two years ago I started getting shoulder aches, but I still wanted to play RTS games. That’s when I started working on a project to allow me to play certain games without using my hands.
At first it started off with 100ms audio and a slow 80ms delay afterwards to respond to inputs, right now I’ve brought it down to 50ms audio with a response time of 10ms.
Also using an eyetracker to move the mouse around so that it’s completely hands free.
A demo where I’m using the program to play Starcraft 2 can be found here with all the controls explained during the video:
The project has the recording tools needed for data collection, using a sliding window over the microphone input to generate 50ms audio files every 25ms.
I added some simple thresholding filters so that I can more easily get the right audio samples when I am recording them ( sibilants can get by with just a pitch threshold, others like finger snaps work best with high peak-peak thresholds )
I’m using neural nets with four layers in an ensemble to do the recognition part, and do some post-processing to make sure keyboard-inputs are done at the proper times with as little mis-clicks as possible.
I validate out-of-sample performance by recording some more sounds and analysing the outputs of the model in a few graphs ( https://github.com/chaosparrot/parrot.py/blob/master/docs/ANALYSING.md ).
The post-processing tweaks I do after playing a match in a game, and alter the thresholds for input activation based on my experience during it ( maybe I felt the SHIFT key was pressed too late, or another key was way too trigger happy )
by analysing the model output of the match with the CSV output of the recognitions.
The program is multithreaded to ensure that I don’t lose audio recordings during the feature-engineering/evaluation phase.
A github with all the code can be found here: https://github.com/chaosparrot/parrot.py
As for the future, I think I want to make it record 30ms sounds read at 60hz, and maybe fool around with some CNNs to see if it improves the recognition.
Considering I also control the data collection, I can just add a few thousand more samples of certain sounds, so I might try training with 5000 samples per label instead of 1500.