Skip to main content


Learn About Our Meetup

5000+ Members



Join our meetup, learn, connect, share, and get to know your Toronto AI community. 



Browse through the latest deep learning, ai, machine learning postings from Indeed for the GTA.



Are you looking to sponsor space, be a speaker, or volunteer, feel free to give us a shout.

[P] Playing RTS games with audio recognition instead of using hands for input

So about two years ago I started getting shoulder aches, but I still wanted to play RTS games. That’s when I started working on a project to allow me to play certain games without using my hands.

At first it started off with 100ms audio and a slow 80ms delay afterwards to respond to inputs, right now I’ve brought it down to 50ms audio with a response time of 10ms.

Also using an eyetracker to move the mouse around so that it’s completely hands free.

A demo where I’m using the program to play Starcraft 2 can be found here with all the controls explained during the video:

The project has the recording tools needed for data collection, using a sliding window over the microphone input to generate 50ms audio files every 25ms.

I added some simple thresholding filters so that I can more easily get the right audio samples when I am recording them ( sibilants can get by with just a pitch threshold, others like finger snaps work best with high peak-peak thresholds )

I’m using neural nets with four layers in an ensemble to do the recognition part, and do some post-processing to make sure keyboard-inputs are done at the proper times with as little mis-clicks as possible.

I validate out-of-sample performance by recording some more sounds and analysing the outputs of the model in a few graphs ( ).

The post-processing tweaks I do after playing a match in a game, and alter the thresholds for input activation based on my experience during it ( maybe I felt the SHIFT key was pressed too late, or another key was way too trigger happy )

by analysing the model output of the match with the CSV output of the recognitions.

The program is multithreaded to ensure that I don’t lose audio recordings during the feature-engineering/evaluation phase.

A github with all the code can be found here:

As for the future, I think I want to make it record 30ms sounds read at 60hz, and maybe fool around with some CNNs to see if it improves the recognition.

Considering I also control the data collection, I can just add a few thousand more samples of certain sounds, so I might try training with 5000 samples per label instead of 1500.

submitted by /u/chaosparrot
[link] [comments]