Blog

Learn About Our Meetup

4500+ Members

[D] should i over-sample rare episodes with successful exploration ?

I am using DRL (mostly policy gradients) in a simulated discrete sokoban-style environment.

Alex-the-agent is rewarded for the shortest possible solution, as well as training on progressively harder/intricate maps. After a while, exploration is very difficult, and it takes millions of attempts to complete an episode with a slightly-better score. To be clear, this is not a plateauing of performance, it just takes excessively longer exploration.

Should i be “over-sampling” these increasing-rare successful score improvements ?

I use PG since it works, but I am open to trying value techniques.

submitted by /u/so_tiredso_tired
[link] [comments]

Next Meetup

 

Days
:
Hours
:
Minutes
:
Seconds

 

Plug yourself into AI and don't miss a beat