[P] Learning to play Tetris, again
This is a follow-up to my original post of using MCTS and TD learning to solve the Tetris environment.
As I have made quite some progress, I think it would be interesting to share my latest results with you.
And here is the average scores (according to the guideline) and line clears at each iteration
where each iteration consists of 50 games of normal play (500 simulations per move (SPM), used for training) and 1 game of benchmark play (1000 SPM). The games shown in the video are the benchmark plays at each iteration.
The previous agent was only able to achieve about ~15 line clears after 800 games, with this new agent, however, we can achieve ~1.3k line clears at 750 games (iteration 15) which is about 100 times better than the previous one. Furthermore, the maximum line clear was 5678 (what a coincidence) in one of the games at iteration 15, about 4 times higher than the maximum achieved by the previous agent. Also, note that the new agent was trained using the raw score which is noisier and harder than training on the number of line clears.
As a not-so-fair comparison, I found an old paper using a complex handcrafted reward function along with TD learning was able to achieve ~8k line clears after ~70k games. Personally, I believe my agent could achieve a similar number with significantly less games since the scores was increasing super-exponentially in the later iterations. Unfortunately, the agent was starting to generate more than 8GB of data which is larger than the RAM of my potato so I had to terminate it there.
If you are interested, more description and the code can be found at my github repository.
Thanks for reading, let me know what you think!