[Discussion] Understanding Subscale WaveRNN & usage of Masked Dilated CNN as conditioning network

Written by torontoai on November 21, 2019. Posted in Reddit MachineLearning.

Related Paper: Efficient Neural Audio Synthesis

I have been reading the sections relating to Subscale WaveRNN in where the DeepMind team was able to generate B samples in a single step. They have discussed about conditioning a particular sample using past samples and up to F samples from the previous sub-tensors future context. In their case, they used a masked dilated CNN (this can be found on the last paragraph of 4.1 Subscale Dependency Scheme). Here’s the excerpt specifically to this:

The Subscale WaveRNN that generates a given sub-tensor is conditioned on the future context of previous sub-tensors using a masked dilated CNN with relus and the mask applied over past connections instead of future ones.

My first question is: how could a masked dilated CNN help with this?

Next, Nal Kalchbrenner has tweeted this quick demo of the Subscale WaveRNN. This one confuses me a lot when I’m referring back to the original paper.

My final question is: does anyone have taken a look at subscaling more closely?

Any insights would be appreciated.

(Note: This is my first post and I am hoping that I followed the format correctly.)

submitted by /u/bigbawsboy
[link] [comments]

Blog

Learn About Our Meetup

5000+ Members

MEETUPS

JOB POSTINGS

CONTACT

[Discussion] Understanding Subscale WaveRNN & usage of Masked Dilated CNN as conditioning network