[D] A good Speech Recognition package ?
I am to work on a Speech Recognition project for the next few weeks/months or so. I don’t have any prior knowledge on the subject, but I roughly guess a basic architecture should not be far from an encoder – decoder architecture. I have to gain insights on the field and put a model in production by the end of the year.
For now, I just want to be able to transcript audio data into text. I have first to understand the basics of audio data. I guess I will have to read some papers about Fourier transforms, spectrograms, denoising, filtering and so on.
I have a few questions for you though.
– First, do you have good resources (MOOC, courses, …) to learn Speech Recognition ? I tried to look for some, and I found a Stanford course (http://web.stanford.edu/class/cs224s/syllabus.html) from 2017. Given the syllabus, would you say it is a good resource to learn from ?
– Then, is it worth it to implement my own model from scratch, or should I use a pre-existing library ? The audio data I want to train my model on are very task-dependent, and I don’t know if a pre-trained model would be good enough to recognize specific terms. On the other hand, I won’t have as much data or computational power as Google to train my own model. Given these elements, what library would you recommend ? I think the ideal solution would be to use a pre-trained model and fine-tune it on my data. Of course, any relevant resources would be much appreciated 🙂
– Overall, what strategy would you recommend me to follow ? I don’t know where to look and where to start.
Thank you so much !