[D] What are the differences and which one is better: noisy networks or parameter space noise?
Noisy Networks for Exploration
Parameter Space Noise for Exploration
While I was learning about various RL algorithms, I’ve successfully implemented noisy networks for my DQN project; I added another set of parameters used as the standard deviations for Gaussian noises added to the main parameters during a forward pass, and optimized through gradient descent along with the main parameters. I obtained satisfying results using my implementation and I finished the project with a decent agent.
Then, recently, I changed my DQN algorithm and turned it into a DDPG/D4PG algorithm. I used the same noisy network algorithm for exploration and it still gave me fine agents from time to time. However, it often did not perform significantly better than the ones that used action space noise with the Ornstein-Uhlenbeck process, sometimes performing worse, even.
Trying to find what I might have mistaken or misunderstood, I searched for the original paper again to read it once more thoroughly. Then, I found an article posted by OpenAI about parameter space noise. At first glance, I thought this was the same thing as the noisy networks—in fact, I still thought those were the same after reading through that article and skimming through the paper on parameter space noise.
Today, reading the paper on parameter space noise more carefully, I finally realized that these are two similar but different approaches to adding noise to the parameter space. I noticed how the method described in “Parameter Space Noise” samples random Gaussian noise values at the beginning of each episode and scales it according to the variation between the actions with and without noise; while noisy nets add additional noise parameters that are optimized through gradient descent rather than using a scalar standard deviation and a scaling factor.
So, I have tried noisy nets so far, but I haven’t used this alternative parameter space noise method.
Are there other differences between these two methods? Do they perform differently? Is one better than the other? What are the main applications for each approach and what would be the best option for DDPG/D4PG?
submitted by /u/Dragonoken
[link] [comments]