[D] Impala vs MCTS for self play
AlphaStar uses Impala over tree search. Comments here explain this is mainly due to action space width. But conceptually, i never grasped by one method makes better use of a given “exploration budget”.
A. Is it just tree width or also the episode length?
B. Someone (maybe Vinyals?) mentioned that “it would be hard to saturate the GPU” with tree search. So if sc2 was a light weight reversible environment, would (a narrow?) tree search become feasible?
(Lets ignore issues such as hidden information, agent league, real time. the building order assistance)
Thank u for any comment