Skip to main content

Blog

Learn About Our Meetup

5000+ Members

MEETUPS

LEARN, CONNECT, SHARE

Join our meetup, learn, connect, share, and get to know your Toronto AI community. 

JOB POSTINGS

INDEED POSTINGS

Browse through the latest deep learning, ai, machine learning postings from Indeed for the GTA.

CONTACT

CONNECT WITH US

Are you looking to sponsor space, be a speaker, or volunteer, feel free to give us a shout.

[D] What is the best implementation of a trainable TTS network for creating custom TTS voices?

In this instance, TTS refers to Text-To-Speech.

As the title implies I am looking for the best way to train a network to produce high-quality text to speech results in a custom voice pulled from training data. Assuming access to large amounts of high-quality speech data from a single speaker, the English language, powerful machines, and extended training times what is the best implementation/codebase to use?

I have done quite a lot of research into this but have found my results to be quite confusing. Tacotron-2 seems to me to provide the highest quality results with an open-source implementation. However, implementations such as ESPnet(1) seem to be geared more towards testing different methods rather than developing your own custom voice. I am not new to Machine Learning but I am new to applying ML to audio or language-related problems thus I am very behind on my understanding of the state of such lines of research.

If I was looking to replicate something like the results from “Natural TTS Synthesis by Conditioning WaveNet on Mel Spectrogram Predictions”(2) where they used 20+ hours of data from an English to produce very natural sounding speech(3) what would be my best option? I just figured I would ask the experts of Reddit before I took the plunge on setting up a codebase and dataset only to realize there were significantly better options available.

Thanks!

(1) https://github.com/espnet/espnet

(2)(paper link) https://arxiv.org/abs/1712.05884

(3)(audio sample link) https://google.github.io/tacotron/publications/tacotron2/index.html

submitted by /u/blackfish_88
[link] [comments]