[D]LSTMs backpropagation from scratch and its trainning doubts

Written by torontoai on July 17, 2019. Posted in Reddit MachineLearning.

Here is an implementation of LSTMs backpropagation from scratch , I am not sure with the dfhs (derivative of hs state) am I doing it the right way ? Please do correct me where I am wrong in my code I tried for so many times still the LSTMs prediction is horrible does it take longest time to train LSTMs ?? Any Suggestions on Back Propagation is highly appericiated as I can’t figure out what is going wrong – Thank you in advance

I tried increasing the iterations to 10,000 from 5,000 which was intially and tried decreasing the learning rate and increasing the batch size

Forward propagation to store all the state necessary for back prop

for i in range(nw): xp = np.zeros(xl) xp[intx] = 1 x = np.hstack((hs[i-1],xp)) xs[i] = x fg[i] = sigmoid(np.dot(x,wf)) ig[i] = sigmoid(np.dot(x,wi)) cg[i] = tangent(np.dot(x,wc)) csc = (cs[i-1] * sigmoid(np.dot(x,wf))) + (sigmoid(np.dot(x,wi)) * sigmoid(np.dot(x,wc))) cs[i] = (cs[i-1] * sigmoid(np.dot(x,wf))) + (sigmoid(np.dot(x,wi)) * sigmoid(np.dot(x,wc))) og[i] = sigmoid(np.dot(x,wo)) hs[i] = sigmoid(np.dot(x,wo)) * tangent(csc) hsc = sigmoid(np.dot(x,wo)) * tangent(csc) ys[i] = sigmoid(np.dot(hsc,wy)) intx = np.argmax(vy[i-1]) dwy = np.zeros((yl,d)) dwf = np.zeros((xl+yl,yl)) dwi = np.zeros((xl+yl,yl)) dwc = np.zeros((xl+yl,yl)) dwo = np.zeros((xl+yl,yl)) dfhs = np.zeros(yl) dfcs = np.zeros(yl) totalError = 0

Back Propagation

for i in reversed(range(nw)): merror = ys[i] - vy[i] dwy += np.dot(np.atleast_2d(hs[i]).T,np.atleast_2d((merror*dsigmoid(ys[i])))) error = np.dot(merror,wy.T) totalError += np.sum(error) e = np.clip(error+dfhs,-6,6) dho = tangent(cs[i]) * e dho = dsigmoid(og[i]) * dho dwo += np.dot(np.atleast_2d(xs[i]).T,np.atleast_2d(dho)) dc = og[i] * e * dtangent(cs[i]) dc = np.clip(dc + dfcs,-6,6) dhf = cs[i-1] * dc dhf = dsigmoid(fg[i]) * dhf dwf += np.dot(np.atleast_2d(xs[i]).T,np.atleast_2d(dhf)) dhi = cg[i] * dc dhi = dsigmoid(ig[i]) * dhi dwi += np.dot(np.atleast_2d(xs[i]).T,np.atleast_2d(dhi)) dhc = ig[i] * dc dhc = dsigmoid(cg[i]) * dhi dwc += np.dot(np.atleast_2d(xs[i]).T,np.atleast_2d(dhc)) dfhs = np.dot(dho,wo.T)[:yl]+np.dot(dhc,wc.T)[:yl]+np.dot(dhi,wi.T)[:yl]+np.dot(dhf,wf.T)[:yl] dfcs = fg[i] * dc

submitted by /u/Dewanik-Koirala
[link] [comments]

Blog

Learn About Our Meetup

5000+ Members

MEETUPS

JOB POSTINGS

CONTACT

[D]LSTMs backpropagation from scratch and its trainning doubts

Forward propagation to store all the state necessary for back prop

Back Propagation