Skip to main content

Blog

Learn About Our Meetup

5000+ Members

MEETUPS

LEARN, CONNECT, SHARE

Join our meetup, learn, connect, share, and get to know your Toronto AI community. 

JOB POSTINGS

INDEED POSTINGS

Browse through the latest deep learning, ai, machine learning postings from Indeed for the GTA.

CONTACT

CONNECT WITH US

Are you looking to sponsor space, be a speaker, or volunteer, feel free to give us a shout.

[D]LSTMs backpropagation from scratch and its trainning doubts

Here is an implementation of LSTMs backpropagation from scratch , I am not sure with the dfhs (derivative of hs state) am I doing it the right way ? Please do correct me where I am wrong in my code I tried for so many times still the LSTMs prediction is horrible does it take longest time to train LSTMs ?? Any Suggestions on Back Propagation is highly appericiated as I can’t figure out what is going wrong – Thank you in advance

I tried increasing the iterations to 10,000 from 5,000 which was intially and tried decreasing the learning rate and increasing the batch size

Forward propagation to store all the state necessary for back prop

for i in range(nw): xp = np.zeros(xl) xp[intx] = 1 x = np.hstack((hs[i-1],xp)) xs[i] = x fg[i] = sigmoid(np.dot(x,wf)) ig[i] = sigmoid(np.dot(x,wi)) cg[i] = tangent(np.dot(x,wc)) csc = (cs[i-1] * sigmoid(np.dot(x,wf))) + (sigmoid(np.dot(x,wi)) * sigmoid(np.dot(x,wc))) cs[i] = (cs[i-1] * sigmoid(np.dot(x,wf))) + (sigmoid(np.dot(x,wi)) * sigmoid(np.dot(x,wc))) og[i] = sigmoid(np.dot(x,wo)) hs[i] = sigmoid(np.dot(x,wo)) * tangent(csc) hsc = sigmoid(np.dot(x,wo)) * tangent(csc) ys[i] = sigmoid(np.dot(hsc,wy)) intx = np.argmax(vy[i-1]) dwy = np.zeros((yl,d)) dwf = np.zeros((xl+yl,yl)) dwi = np.zeros((xl+yl,yl)) dwc = np.zeros((xl+yl,yl)) dwo = np.zeros((xl+yl,yl)) dfhs = np.zeros(yl) dfcs = np.zeros(yl) totalError = 0 

Back Propagation

for i in reversed(range(nw)): merror = ys[i] - vy[i] dwy += np.dot(np.atleast_2d(hs[i]).T,np.atleast_2d((merror*dsigmoid(ys[i])))) error = np.dot(merror,wy.T) totalError += np.sum(error) e = np.clip(error+dfhs,-6,6) dho = tangent(cs[i]) * e dho = dsigmoid(og[i]) * dho dwo += np.dot(np.atleast_2d(xs[i]).T,np.atleast_2d(dho)) dc = og[i] * e * dtangent(cs[i]) dc = np.clip(dc + dfcs,-6,6) dhf = cs[i-1] * dc dhf = dsigmoid(fg[i]) * dhf dwf += np.dot(np.atleast_2d(xs[i]).T,np.atleast_2d(dhf)) dhi = cg[i] * dc dhi = dsigmoid(ig[i]) * dhi dwi += np.dot(np.atleast_2d(xs[i]).T,np.atleast_2d(dhi)) dhc = ig[i] * dc dhc = dsigmoid(cg[i]) * dhi dwc += np.dot(np.atleast_2d(xs[i]).T,np.atleast_2d(dhc)) dfhs = np.dot(dho,wo.T)[:yl]+np.dot(dhc,wc.T)[:yl]+np.dot(dhi,wi.T)[:yl]+np.dot(dhf,wf.T)[:yl] dfcs = fg[i] * dc 

submitted by /u/Dewanik-Koirala
[link] [comments]