Skip to main content

Blog

Learn About Our Meetup

5000+ Members

MEETUPS

LEARN, CONNECT, SHARE

Join our meetup, learn, connect, share, and get to know your Toronto AI community. 

JOB POSTINGS

INDEED POSTINGS

Browse through the latest deep learning, ai, machine learning postings from Indeed for the GTA.

CONTACT

CONNECT WITH US

Are you looking to sponsor space, be a speaker, or volunteer, feel free to give us a shout.

[P] Google’s wavenet API so good that it’s synthetic speech can be used to train hotword detectors with no ‘real’ data?

[P] Google's wavenet API so good that it's synthetic speech can be used to train hotword detectors with no 'real' data?

TLDR: Google TTS -> Simple Noise augment -> {wav files} ->SnowBoy ->{.pmdl models} -> Raspberry Pi

So, I trained a black-box deep net hotword detector (using Snowboy/kitt.ai) entirely out of synthetic speech samples generated using Google’s Text-to-speech API and it was able to ‘transfer to the real world’ on a Raspberry Pi-3. Not entirely shocking. But reasonably neat I suppose given that you need to spend $0 for this. (Free GC credits + free 100 API calls from Snowboy + Colab)

Project picture:

The final hardware setup

I’d posit we are not too far off at least for this problem space from a point where we can directly do text->model generation directly, sans any data collection.

Blog: https://towardsdatascience.com/build-your-own-custom-hotword-detector-with-zero-training-data-and-0-35adfa6b25ea

Code/Colab notebooks (pre-cleanup :P) : https://github.com/vinayprabhu/BurningMan2019

Demo Video: https://www.youtube.com/watch?time_continue=1&v=kIigaO6Iga0

submitted by /u/VinayUPrabhu
[link] [comments]