Skip to main content


Learn About Our Meetup

5000+ Members



Join our meetup, learn, connect, share, and get to know your Toronto AI community. 



Browse through the latest deep learning, ai, machine learning postings from Indeed for the GTA.



Are you looking to sponsor space, be a speaker, or volunteer, feel free to give us a shout.

[D] MCTS on raw network not trained with MCTS

In the AlphaGo Zero paper figure 6b shows the performance of a raw network which directly takes the action with the highest q-value(?) versus an MCTS approach which gets 5 seconds of thinking time. The MCTS approach has a large performance gain over the raw network approach.

Now I have trained a network with a policy and value head that uses the first approach and does not have a tree structure with accompanying data (such as times visited per node). I’m wondering if I can skip training using MCTS but just use the network to build a tree in the simulation phase and if there’s any precedent for this technique.

The problem is a deterministic RL problem with only one goal state and no other rewards. The same state can be reached twice and this often happens when I use the raw network approach. The agent then gets stuck in a loop. In a previous post, someone suggested taking the next best option once a certain state is reached more than once. This worked like a charm. But for real-world application, I would like to keep the number of actions taken as low as possible. This is why I think MCTS mightbe an improvement.

submitted by /u/matigekunst
[link] [comments]