Learn About Our Meetup

5000+ Members



Join our meetup, learn, connect, share, and get to know your Toronto AI community. 



Browse through the latest deep learning, ai, machine learning postings from Indeed for the GTA.



Are you looking to sponsor space, be a speaker, or volunteer, feel free to give us a shout.

[D] Voice Assistant: Better to use a model trained on commands or just use STT?

I would like to make a deep-learning based voice assistant for an application I have that controls a digital camera. Some example commands are “auto focus”, “set zoom to 2”, “turn off flash”, etc.

I see two ways of going about this:

  1. Train a model that classifies an audio snippet as containing one of the commands or background noise. This seems easier than option 2 but also less robust, as I would have to retrain the model every time I add a new command. Also not sure how numbers would work (record myself saying every number up to like 100?).

  2. Use STT to convert audio to text and do some fuzzy string matching to see if it matches a command. I’ve downloaded Mozilla’s DeepSpeech and it did not seem to work very well, so I’m guessing that creating a good STT model is very difficult.

Which of these is a better approach? Or is there some in-between approach that’s even better?

submitted by /u/elmosworld37
[link] [comments]

Toronto AI is a social and collaborative hub to unite AI innovators of Toronto and surrounding areas. We explore AI technologies in digital art and music, healthcare, marketing, fintech, vr, robotics and more. Toronto AI was founded by Dave MacDonald and Patrick O'Mara.