Skip to main content

Blog

Learn About Our Meetup

5000+ Members

MEETUPS

LEARN, CONNECT, SHARE

Join our meetup, learn, connect, share, and get to know your Toronto AI community. 

JOB POSTINGS

INDEED POSTINGS

Browse through the latest deep learning, ai, machine learning postings from Indeed for the GTA.

CONTACT

CONNECT WITH US

Are you looking to sponsor space, be a speaker, or volunteer, feel free to give us a shout.

[P] 𝝐-Explore, a simple alternative to RL for computer chess

Hi y’all,

I am a second-year at the University of California, Merced and this is a project I’ve been working on over the last few months. Its not state-of-the-art or anything like that, but any feedback on my work would be much appreciated. Keep in mind, I don’t have a degree (yet) in Computer Science, so any form constructive criticism will be helpful!

You can find my code at: https://github.com/PhilipFelizarta/epsilon-Explore

Quick Summary:

Since the creation of AlphaZero, a majority of Deep Learning research and engineering for computer chess has been centered around the “Zero” doctrine; that is, focusing on creating a chess engine utilizing zero human knowledge. While AlphaZero (and Leela Zero) are grand milestones for AI, a common critique is the computational costs required to execute these reinforcement learning algorithms. Motivated to create an efficient, yet scalable learning algorithm, I propose an elementary, yet novel solution: 𝝐-Explore. 𝝐-Explore is a handcrafted adaptation of greedy-epsilon exploration, Go-Explore, and supervised learning that frames exploration tasks as continual learning and utilizes significantly less computational resources when compared to state-of-the-art reinforcement learning algorithms. All experimentation uses only a single GPU (RTX Titan) and a single CPU (Threadripper 16-core). The results of 𝝐-Explore are not state-of-the-art with our experimental setup, but provide a foundation for creating more efficient handcrafted algorithms in other large search spaces given an available expert policy.

Note: I’ll be continually updating this GitHub repository as I do more tests!

submitted by /u/PhilipFelizarta
[link] [comments]