[D] Q on “Language Modeling with Gated Convolutional Networks “
In this paper: https://arxiv.org/pdf/1612.08083.pdf
In figure 1 I see Y = softmax(W H_L) and can’t figure out what W is (I don’t see it discussed in the paper) – it appears to be different than the matrix of conv kernels W in equation 1.
Can someone help explain how Y is computed and what that W is?
submitted by /u/ME_PhD
[link] [comments]