Blog

Learn About Our Meetup

5000+ Members

MEETUPS

LEARN, CONNECT, SHARE

Join our meetup, learn, connect, share, and get to know your Toronto AI community. 

JOB POSTINGS

INDEED POSTINGS

Browse through the latest deep learning, ai, machine learning postings from Indeed for the GTA.

CONTACT

CONNECT WITH US

Are you looking to sponsor space, be a speaker, or volunteer, feel free to give us a shout.

[D] RL) questions regarding log std, clipping outputs.

I’m trying to implement A2C in TF2.0 without using additional libraries like baselines. But I’m having problem with standard deviation(std) and clipping.

First of all, I made variable log_std to avoid 0s in calculations(I exponentiate it when I use log_std like tf.exp(log_std)). I’ve seen this trick in CS294-112 homework2, so I’m using it. But, when I gather trainable variables to train, I’m just using log_std, but is it okay to do this? (Since I made variable which is not exponentiated, so I think I can only update this, but not in exponentiated form) I feel like I shouldn’t since the values of derivative will be different when NN do back-propagating.

Second, I’m clipping actions with tf.clip_by_value(ac, env.action_space.low, env.action_space.high). But, I’m not sure how to clip NN’s output(NN is set to output mean of Gaussian distribution). NN should always output distributions with maximum : env.action_space.high and minimum : env.action_space.low as far as I know. But since I’m using Gaussian distribution, it is impossible to apply above constriction. Then what is usual way to clip NN’s output in case we use Gaussian distribution(or other distributions)?

Final question: do you think I should use RL libraries like tensorforce and baselines?

submitted by /u/wongongv
[link] [comments]

Next Meetup

 

Days
:
Hours
:
Minutes
:
Seconds

 

Plug yourself into AI and don't miss a beat

 


Toronto AI is a social and collaborative hub to unite AI innovators of Toronto and surrounding areas. We explore AI technologies in digital art and music, healthcare, marketing, fintech, vr, robotics and more. Toronto AI was founded by Dave MacDonald and Patrick O'Mara.