[P] Google’s wavenet API so good that it’s synthetic speech can be used to train hotword detectors with no ‘real’ data?

Written by torontoai on August 18, 2019. Posted in Reddit MachineLearning.

TLDR: Google TTS -> Simple Noise augment -> {wav files} ->SnowBoy ->{.pmdl models} -> Raspberry Pi

So, I trained a black-box deep net hotword detector (using Snowboy/kitt.ai) entirely out of synthetic speech samples generated using Google’s Text-to-speech API and it was able to ‘transfer to the real world’ on a Raspberry Pi-3. Not entirely shocking. But reasonably neat I suppose given that you need to spend $0 for this. (Free GC credits + free 100 API calls from Snowboy + Colab)

Project picture:

The final hardware setup

I’d posit we are not too far off at least for this problem space from a point where we can directly do text->model generation directly, sans any data collection.

Blog: https://towardsdatascience.com/build-your-own-custom-hotword-detector-with-zero-training-data-and-0-35adfa6b25ea

Code/Colab notebooks (pre-cleanup :P) : https://github.com/vinayprabhu/BurningMan2019

Demo Video: https://www.youtube.com/watch?time_continue=1&v=kIigaO6Iga0

submitted by /u/VinayUPrabhu
[link] [comments]

Blog

Learn About Our Meetup

5000+ Members

MEETUPS

JOB POSTINGS

CONTACT

[P] Google’s wavenet API so good that it’s synthetic speech can be used to train hotword detectors with no ‘real’ data?