[P] 𝝐-Explore, a simple alternative to RL for computer chess
I am a second-year at the University of California, Merced and this is a project I’ve been working on over the last few months. Its not state-of-the-art or anything like that, but any feedback on my work would be much appreciated. Keep in mind, I don’t have a degree (yet) in Computer Science, so any form constructive criticism will be helpful!
You can find my code at: https://github.com/PhilipFelizarta/epsilon-Explore
Since the creation of AlphaZero, a majority of Deep Learning research and engineering for computer chess has been centered around the “Zero” doctrine; that is, focusing on creating a chess engine utilizing zero human knowledge. While AlphaZero (and Leela Zero) are grand milestones for AI, a common critique is the computational costs required to execute these reinforcement learning algorithms. Motivated to create an efficient, yet scalable learning algorithm, I propose an elementary, yet novel solution: 𝝐-Explore. 𝝐-Explore is a handcrafted adaptation of greedy-epsilon exploration, Go-Explore, and supervised learning that frames exploration tasks as continual learning and utilizes significantly less computational resources when compared to state-of-the-art reinforcement learning algorithms. All experimentation uses only a single GPU (RTX Titan) and a single CPU (Threadripper 16-core). The results of 𝝐-Explore are not state-of-the-art with our experimental setup, but provide a foundation for creating more efficient handcrafted algorithms in other large search spaces given an available expert policy.
Note: I’ll be continually updating this GitHub repository as I do more tests!