[D] Having trouble understanding some parts of AlphaZero
These are my questions:
25,000 games are played against itself. After each game, is the MCTS reset for the next game, or is it kept?
Does each neural network consist of 1000 batches of 2048 game positions, or can it have more than 2,048,000 inputs?
After a new neural network is chosen, is the MCTS reset (thrown away) or kept? Are parts of it reset? Like all of the nodes in the MCTS have P’s from the prior neural network, so do these get recalculated? Are the W’s kept the same even though they were all calculated using V’s from the previous neural network?