[P] Google’s wavenet API so good that it’s synthetic speech can be used to train hotword detectors with no ‘real’ data?
TLDR: Google TTS -> Simple Noise augment -> {wav files} ->SnowBoy ->{.pmdl models} -> Raspberry Pi So, I trained a black-box deep net hotword detector (using Snowboy/kitt.ai) entirely out of synthetic speech samples generated using Google’s Text-to-speech API and it was able to ‘transfer to the real world’ on a Raspberry Pi-3. Not entirely shocking. But reasonably neat I suppose given that you need to spend $0 for this. (Free GC credits + free 100 API calls from Snowboy + Colab) Project picture: I’d posit we are not too far off at least for this problem space from a point where we can directly do text->model generation directly, sans any data collection. Code/Colab notebooks (pre-cleanup :P) : https://github.com/vinayprabhu/BurningMan2019 Demo Video: https://www.youtube.com/watch?time_continue=1&v=kIigaO6Iga0 submitted by /u/VinayUPrabhu |