[D] Question about rewards in deep Q learning
Hello I want to create a deep Q learning agent for a 2 player board game. My rewards are 250 for winning the game, 100, 80 and 50 for “good” moves. My tutor said to me that I should normalize the rewards because there are only limited amounts of different rewards and there are possibly infinitely manyQ values. How should I normalize the rewards? Should I normalize the rewards to [0,1] range so that 0,8 represents a game winning move, 0,3 a good move for example?