Skip to main content


Learn About Our Meetup

5000+ Members



Join our meetup, learn, connect, share, and get to know your Toronto AI community. 



Browse through the latest deep learning, ai, machine learning postings from Indeed for the GTA.



Are you looking to sponsor space, be a speaker, or volunteer, feel free to give us a shout.

[Discussion] Understanding Subscale WaveRNN & usage of Masked Dilated CNN as conditioning network

Related Paper: Efficient Neural Audio Synthesis

I have been reading the sections relating to Subscale WaveRNN in where the DeepMind team was able to generate B samples in a single step. They have discussed about conditioning a particular sample using past samples and up to F samples from the previous sub-tensors future context. In their case, they used a masked dilated CNN (this can be found on the last paragraph of 4.1 Subscale Dependency Scheme). Here’s the excerpt specifically to this:

The Subscale WaveRNN that generates a given sub-tensor is conditioned on the future context of previous sub-tensors using a masked dilated CNN with relus and the mask applied over past connections instead of future ones.

My first question is: how could a masked dilated CNN help with this?

Next, Nal Kalchbrenner has tweeted this quick demo of the Subscale WaveRNN. This one confuses me a lot when I’m referring back to the original paper.

My final question is: does anyone have taken a look at subscaling more closely?

Any insights would be appreciated.

(Note: This is my first post and I am hoping that I followed the format correctly.)

submitted by /u/bigbawsboy
[link] [comments]