[D]LSTMs backpropagation from scratch and its trainning doubts
Here is an implementation of LSTMs backpropagation from scratch , I am not sure with the dfhs (derivative of hs state) am I doing it the right way ? Please do correct me where I am wrong in my code I tried for so many times still the LSTMs prediction is horrible does it take longest time to train LSTMs ?? Any Suggestions on Back Propagation is highly appericiated as I can’t figure out what is going wrong – Thank you in advance
I tried increasing the iterations to 10,000 from 5,000 which was intially and tried decreasing the learning rate and increasing the batch size
Forward propagation to store all the state necessary for back prop
for i in range(nw): xp = np.zeros(xl) xp[intx] = 1 x = np.hstack((hs[i-1],xp)) xs[i] = x fg[i] = sigmoid(np.dot(x,wf)) ig[i] = sigmoid(np.dot(x,wi)) cg[i] = tangent(np.dot(x,wc)) csc = (cs[i-1] * sigmoid(np.dot(x,wf))) + (sigmoid(np.dot(x,wi)) * sigmoid(np.dot(x,wc))) cs[i] = (cs[i-1] * sigmoid(np.dot(x,wf))) + (sigmoid(np.dot(x,wi)) * sigmoid(np.dot(x,wc))) og[i] = sigmoid(np.dot(x,wo)) hs[i] = sigmoid(np.dot(x,wo)) * tangent(csc) hsc = sigmoid(np.dot(x,wo)) * tangent(csc) ys[i] = sigmoid(np.dot(hsc,wy)) intx = np.argmax(vy[i-1]) dwy = np.zeros((yl,d)) dwf = np.zeros((xl+yl,yl)) dwi = np.zeros((xl+yl,yl)) dwc = np.zeros((xl+yl,yl)) dwo = np.zeros((xl+yl,yl)) dfhs = np.zeros(yl) dfcs = np.zeros(yl) totalError = 0
Back Propagation
for i in reversed(range(nw)): merror = ys[i] - vy[i] dwy += np.dot(np.atleast_2d(hs[i]).T,np.atleast_2d((merror*dsigmoid(ys[i])))) error = np.dot(merror,wy.T) totalError += np.sum(error) e = np.clip(error+dfhs,-6,6) dho = tangent(cs[i]) * e dho = dsigmoid(og[i]) * dho dwo += np.dot(np.atleast_2d(xs[i]).T,np.atleast_2d(dho)) dc = og[i] * e * dtangent(cs[i]) dc = np.clip(dc + dfcs,-6,6) dhf = cs[i-1] * dc dhf = dsigmoid(fg[i]) * dhf dwf += np.dot(np.atleast_2d(xs[i]).T,np.atleast_2d(dhf)) dhi = cg[i] * dc dhi = dsigmoid(ig[i]) * dhi dwi += np.dot(np.atleast_2d(xs[i]).T,np.atleast_2d(dhi)) dhc = ig[i] * dc dhc = dsigmoid(cg[i]) * dhi dwc += np.dot(np.atleast_2d(xs[i]).T,np.atleast_2d(dhc)) dfhs = np.dot(dho,wo.T)[:yl]+np.dot(dhc,wc.T)[:yl]+np.dot(dhi,wi.T)[:yl]+np.dot(dhf,wf.T)[:yl] dfcs = fg[i] * dc
submitted by /u/Dewanik-Koirala
[link] [comments]