Skip to main content

Blog

Learn About Our Meetup

5000+ Members

MEETUPS

LEARN, CONNECT, SHARE

Join our meetup, learn, connect, share, and get to know your Toronto AI community. 

JOB POSTINGS

INDEED POSTINGS

Browse through the latest deep learning, ai, machine learning postings from Indeed for the GTA.

CONTACT

CONNECT WITH US

Are you looking to sponsor space, be a speaker, or volunteer, feel free to give us a shout.

[P] MelGAN vocoder implementation in PyTorch

[P] MelGAN vocoder implementation in PyTorch

Disclaimer: This is a third-party implementation. The original authors stated that they will be releasing code soon.

A recent research showed that fully-convolutional GAN called MelGAN can invert mel-spectrogram into raw audio in non-autoregressive manner. They showed that their MelGAN is lighter & faster than WaveGlow, and even can generalize to unseen speakers when trained on 3 male + 3 female speakers’ speech.

I thought this is a major breakthrough in TTS reserach, since both researchers and engineers can benefit from this fast & lightweight neural vocoder. So I’ve tried to implement this is PyTorch: see GitHub link w/ audio samples below.

Debugging was quite painful while implementing this. Changing the update order of G/D mattered much, and my generator’s loss curve is still going up. (Though results looks good when compared to original paper’s.)

Figure 1 from “MelGAN: Generative Adversarial Networks for Conditional Waveform Synthesis”

submitted by /u/seungwonpark
[link] [comments]