[D] Question about deep Q learning
Hello I am implementing deep Q learning for a 2 player board game. After every move it is the turn of the other player. I want to calculate max(Q’,a’) for getting the max Q value for the next state but my problem is that the next state represents the quality for the opponent player. So max Q is the max quality value for my opponent(But I want to maximize MY win chances) How do I proceed? Should I calculate all states where it is again my turn?(2 depth)
submitted by /u/Kralex68
[link] [comments]