Blog

I want to use two neural networks to calculate the Q value for my current state(A board game). I use a sigmoid function. I correct the action with the highest value max Q that I obtain in the target network(The rest target is set to original output of the DQN). Is this the correct approach or should I correct all output values in one iteration? Second question: How do I calculate my target value r+ b mac Q(s(prime….). Should I use fixed reward values? **Do I have to use reward values so that I can not surpass the possible output range of sigmoid function?(**Like 0,4 or 0,2) Thank you

submitted by /u/Kralex68
[link] [comments]