Skip to main content


Learn About Our Meetup

5000+ Members



Join our meetup, learn, connect, share, and get to know your Toronto AI community. 



Browse through the latest deep learning, ai, machine learning postings from Indeed for the GTA.



Are you looking to sponsor space, be a speaker, or volunteer, feel free to give us a shout.

[D] A good Speech Recognition package ?

Hi Reddit,

I am to work on a Speech Recognition project for the next few weeks/months or so. I don’t have any prior knowledge on the subject, but I roughly guess a basic architecture should not be far from an encoder – decoder architecture. I have to gain insights on the field and put a model in production by the end of the year.

For now, I just want to be able to transcript audio data into text. I have first to understand the basics of audio data. I guess I will have to read some papers about Fourier transforms, spectrograms, denoising, filtering and so on.

I have a few questions for you though.

– First, do you have good resources (MOOC, courses, …) to learn Speech Recognition ? I tried to look for some, and I found a Stanford course ( from 2017. Given the syllabus, would you say it is a good resource to learn from ?

– Then, is it worth it to implement my own model from scratch, or should I use a pre-existing library ? The audio data I want to train my model on are very task-dependent, and I don’t know if a pre-trained model would be good enough to recognize specific terms. On the other hand, I won’t have as much data or computational power as Google to train my own model. Given these elements, what library would you recommend ? I think the ideal solution would be to use a pre-trained model and fine-tune it on my data. Of course, any relevant resources would be much appreciated 🙂

– Overall, what strategy would you recommend me to follow ? I don’t know where to look and where to start.

Thank you so much !

submitted by /u/lazywiing
[link] [comments]