Skip to main content


Learn About Our Meetup

5000+ Members



Join our meetup, learn, connect, share, and get to know your Toronto AI community. 



Browse through the latest deep learning, ai, machine learning postings from Indeed for the GTA.



Are you looking to sponsor space, be a speaker, or volunteer, feel free to give us a shout.

[P] Not Jordan Peterson – Speech synthesis using Google’s Tacotron 2 and Nvidia’s Waveglow


The technology used to generate audio on this site is a combination of two neural network models that were trained using audio data of Dr. Peterson speaking, along with the transcript of his speech. If you don’t know who Jordan Peterson is or what his voice sounds like, you can find links to his podcast, lectures, and YouTube videos on his website.

The first model, developed at Google, is called Tacotron 2. It takes as input the text that you type and produces what is known as an audio spectrogram, which represents the amplitudes of the frequencies in an audio signal at each moment in time. The model is trained on text/spectrogram pairs, where the spectrograms are extracted from the source audio data using a Fourier transform.

The second model, developed at NVIDIA, is called Waveglow. It acts as a vocoder, taking in the spectrogram output of Tacotron 2 and producing a full audio waveform, which is what gets encoded into an audio file you can then listen to. The model is trained on spectrogram/waveform pairs of short segments of speech.

The implementations used to create this site were forked from NVIDIA’s public implementations of Waveglow and Tacotron 2.

Disclaimer: This is not my product, I found it online.

submitted by /u/satvikpendem
[link] [comments]