[D] why the same reinforcement learning algorithm worked for MountainCar, but does not work for LunarLander (and others)
Hi Reddit community, I’m currently self-learning/exploring reinforcement learning. I have downloaded a few codes to try out and to get a feel of the code. There is a piece of code [code A] about using A3C for CartPole-v0, and it manages to learn very well. And another piece of code [code B] that uses DQN for LunarLander-v2, it managed to train a smart agent too.
Then I change the environment in code A (uses A3C) to LunarLander-v2 and MountainCar-v0, there weren’t any errors, but the agent fails to learn. Likewise, I change the environment in code B (uses DQN) to CartPole-v0 and MountainCar-v0, it didn’t learn as well.
Why is it so? Is it because different environments have different rewards system? Or the hyperparameters that worked for CartPole-v0 does not work for LunarLander-v2?
submitted by /u/ErmJustSaying
[link] [comments]