Skip to main content

Blog

Learn About Our Meetup

5000+ Members

MEETUPS

LEARN, CONNECT, SHARE

Join our meetup, learn, connect, share, and get to know your Toronto AI community. 

JOB POSTINGS

INDEED POSTINGS

Browse through the latest deep learning, ai, machine learning postings from Indeed for the GTA.

CONTACT

CONNECT WITH US

Are you looking to sponsor space, be a speaker, or volunteer, feel free to give us a shout.

[D] RL) questions regarding log std, clipping outputs.

I’m trying to implement A2C in TF2.0 without using additional libraries like baselines. But I’m having problem with standard deviation(std) and clipping.

First of all, I made variable log_std to avoid 0s in calculations(I exponentiate it when I use log_std like tf.exp(log_std)). I’ve seen this trick in CS294-112 homework2, so I’m using it. But, when I gather trainable variables to train, I’m just using log_std, but is it okay to do this? (Since I made variable which is not exponentiated, so I think I can only update this, but not in exponentiated form) I feel like I shouldn’t since the values of derivative will be different when NN do back-propagating.

Second, I’m clipping actions with tf.clip_by_value(ac, env.action_space.low, env.action_space.high). But, I’m not sure how to clip NN’s output(NN is set to output mean of Gaussian distribution). NN should always output distributions with maximum : env.action_space.high and minimum : env.action_space.low as far as I know. But since I’m using Gaussian distribution, it is impossible to apply above constriction. Then what is usual way to clip NN’s output in case we use Gaussian distribution(or other distributions)?

Final question: do you think I should use RL libraries like tensorforce and baselines?

submitted by /u/wongongv
[link] [comments]