[P] A Chess/Go/Shogi model that passes the Turing test, how do I build an imitation learning model that incorporates some kind of lookahead algorithm?
I want to build a model for Chess/Go/Shogi that is trained and tested on real players, and I want it to pass the Turing test. I don’t want my model to play the best move in a position, I want it to play the move that a person would play (of a certain strength, time control, etc..).
It’s easy to make this a classification problem and train a CNN on a one-hot encoded policy of actual moves played. The only problem is, without some kind of look-ahead algorithm (MCTS for example) the model fails to learn sequences that require multiple moves, such as tactics.
However, current MCTS/alpha-beta/minimax models require evaluation of leaf nodes. I don’t have a way to shape the reward to an evaluation of a leaf node. So my question: how would I incorporate a look-ahead algorithm in an imitation learning problem like this?